UltraFeedback¶
Rank generations focusing on different aspects using an LLM
.
UltraFeedback: Boosting Language Models with High-quality Feedback.
Attributes¶
- aspect: The aspect to perform with the
UltraFeedback
model. The available aspects are: -helpfulness
: Evaluate text outputs based on helpfulness. -honesty
: Evaluate text outputs based on honesty. -instruction-following
: Evaluate text outputs based on given instructions. -truthfulness
: Evaluate text outputs based on truthfulness. Additionally, a custom aspect has been defined by Argilla, so as to evaluate the overall assessment of the text outputs within a single prompt. The custom aspect is: -overall-rating
: Evaluate text outputs based on an overall assessment. Defaults to"overall-rating"
.
Input & Output Columns¶
graph TD
subgraph Dataset
subgraph Columns
ICOL0[instruction]
ICOL1[generations]
end
subgraph New columns
OCOL0[rationales]
OCOL1[ratings]
OCOL2[model_name]
OCOL3[types]
end
end
subgraph UltraFeedback
StepInput[Input Columns: instruction, generations]
StepOutput[Output Columns: rationales, ratings, model_name, types]
end
ICOL0 --> StepInput
ICOL1 --> StepInput
StepOutput --> OCOL0
StepOutput --> OCOL1
StepOutput --> OCOL2
StepOutput --> OCOL3
StepInput --> StepOutput
Inputs¶
-
instruction (
str
): The reference instruction to evaluate the text outputs. -
generations (
List[str]
): The text outputs to evaluate for the given instruction.
Outputs¶
-
rationales (
List[str]
, optional): The rationales for ratings. -
ratings (
List[float]
): The ratings for each of the provided text outputs. -
model_name (
str
): The name of the model used to generate the ratings and rationales. -
types (
List[str]
, optional): The types for ratings.
Examples¶
Rate generations from different LLMs based on the selected aspect¶
from distilabel.steps.tasks import UltraFeedback
from distilabel.llms.huggingface import InferenceEndpointsLLM
# Consider this as a placeholder for your actual LLM.
ultrafeedback = UltraFeedback(
llm=InferenceEndpointsLLM(
model_id="mistralai/Mistral-7B-Instruct-v0.2",
),
use_default_structured_output=False
)
ultrafeedback.load()
result = next(
ultrafeedback.process(
[
{
"instruction": "How much is 2+2?",
"generations": ["4", "and a car"],
}
]
)
)
# result
# [
# {
# 'instruction': 'How much is 2+2?',
# 'generations': ['4', 'and a car'],
# 'ratings': [1, 2],
# 'rationales': ['explanation for 4', 'explanation for and a car'],
# 'model_name': 'mistralai/Mistral-7B-Instruct-v0.2',
# }
# ]
using the default structured output¶
from distilabel.steps.tasks import UltraFeedback
from distilabel.llms.huggingface import InferenceEndpointsLLM
# Consider this as a placeholder for your actual LLM.
ultrafeedback = UltraFeedback(
llm=InferenceEndpointsLLM(
model_id="meta-llama/Meta-Llama-3.1-70B-Instruct",
generation_kwargs={"max_new_tokens": 512},
),
aspect="helpfulness"
)
ultrafeedback.load()
result = next(
ultrafeedback.process(
[
{
"instruction": "How much is 2+2?",
"generations": ["4", "and a car"],
}
]
)
)
# result
# [{'instruction': 'How much is 2+2?',
# 'generations': ['4', 'and a car'],
# 'ratings': [1, 5],
# 'rationales': ['Text 1 is clear and relevant, providing the correct answer to the question. It is also not lengthy and does not contain repetition. However, it lacks comprehensive information or detailed description.',
# 'Text 2 is neither clear nor relevant to the task. It does not provide any useful information and seems unrelated to the question.'],
# 'rationales_for_rating': ['Text 1 is rated as Correct (3) because it provides the accurate answer to the question, but lacks comprehensive information or detailed description.',
# 'Text 2 is rated as Severely Incorrect (1) because it does not provide any relevant information and seems unrelated to the question.'],
# 'types': [1, 3, 1],
# 'distilabel_metadata': {'raw_output_ultra_feedback_0': '{
"ratings": [
1,
5
]
,
"rationales": [
"Text 1 is clear and relevant, providing the correct answer to the question. It is also not lengthy and does not contain repetition. However, it lacks comprehensive information or detailed description.",
"Text 2 is neither clear nor relevant to the task. It does not provide any useful information and seems unrelated to the question."
]
,
"rationales_for_rating": [
"Text 1 is rated as Correct (3) because it provides the accurate answer to the question, but lacks comprehensive information or detailed description.",
"Text 2 is rated as Severely Incorrect (1) because it does not provide any relevant information and seems unrelated to the question."
]
,
"types": [
1, 3,
1
]
}'},
# 'model_name': 'meta-llama/Meta-Llama-3.1-70B-Instruct'}]