ultrajudge
Area
¶
UltraJudgeOutput
¶
UltraJudgeTask
dataclass
¶
Bases: PreferenceTask
A PreferenceTask
for the UltraJudge task. The UltraJudge
task has been defined
at Argilla specifically for a better evaluation using AI Feedback. The task is defined
based on both UltraFeedback and JudgeLM, but with several improvements / modifications.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. Defaults to |
"You are an evaluator tasked with assessing AI assistants' responses from the perspective of typical user preferences. Your critical analysis should focus on human-like engagement, solution effectiveness, accuracy, clarity, and creativity. Approach each response as if you were the user, considering how well the response meets your needs and expectations in a real-world scenario. Provide detailed feedback that highlights strengths and areas for improvement in each response, keeping in mind the goal of simulating a human's preferred choice. Your evaluation should be impartial and thorough, reflecting a human's perspective in preferring responses that are practical, clear, authentic, and aligned with their intent. Avoid bias, and focus on the content and quality of the responses."
|
task_description |
Union[str, None]
|
the description of the task. Defaults to |
"Your task is to rigorously evaluate the performance of {num_responses} AI assistants, simulating a human's perspective. You will assess each response based on four key domains, reflecting aspects that are typically valued by humans: {areas}. First provide a score between 0 and 10 and write a detailed feedback for each area and assistant. Finally, provide a list of {num_responses} scores, each separated by a space, to reflect the performance of Assistants 1 to {num_responses}."
|
areas |
List[str]
|
the areas to be used for the task. Defaults to a list of four areas: "Practical Accuracy", "Clarity & Transparency", "Authenticity & Reliability", and "Compliance with Intent". |
field(default_factory=lambda: ['Practical Accuracy', 'Clarity & Transparency', 'Authenticity & Reliability', 'Compliance with Intent'])
|
References
Source code in src/distilabel/tasks/preference/ultrajudge.py
|
|
areas_str: str
property
¶
Returns a string representation of the areas.
extract_area_score_and_rationale_regex: str
property
¶
Returns a regex to extract the area, score, and rationale from the output.
extract_final_scores_regex: str
property
¶
Returns a regex to extract the final scores from the output.
output_args_names: List[str]
property
¶
Returns the names of the output arguments of the task.
generate_prompt(input, generations, **_)
¶
Generates a prompt following the UltraJudge specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
generations |
List[str]
|
the generations to be used for the prompt. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.preference import UltraJudgeTask
>>> task = UltraJudgeTask(system_prompt="You are a helpful assistant.")
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
Prompt(
system_prompt="You are a helpful assistant.",
formatted_prompt="Your task is to rigorously evaluate the performance of ...",
)
Source code in src/distilabel/tasks/preference/ultrajudge.py
parse_output(output)
¶
Parses the output of the model into the desired format.