tasks
CritiqueTask
dataclass
Bases: RatingToArgillaMixin
, Task
A Task
for critique / judge tasks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. |
required |
task_description |
Union[str, None]
|
the description of the task. Defaults to |
None
|
Source code in src/distilabel/tasks/critique/base.py
input_args_names: List[str]
property
Returns the names of the input arguments of the task.
output_args_names: List[str]
property
Returns the names of the output arguments of the task.
JudgeLMTask
dataclass
Bases: PreferenceTask
A PreferenceTask
following the prompt templated used by JudgeLM.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. Defaults to |
'You are a helpful and precise assistant for checking the quality of the answer.'
|
task_description |
Union[str, None]
|
the description of the task. Defaults to |
'We would like to request your feedback on the performance of {num_responses} AI assistants in response to the user question displayed above.\nPlease rate the helpfulness, relevance, accuracy, level of details of their responses. Each assistant receives an overall score on a scale of 1 to 10, where a higher score indicates better overall performance.\nPlease first output a single line containing only {num_responses} values indicating the scores for Assistants 1 to {num_responses}, respectively. The {num_responses} scores are separated by a space. In the subsequent line, please provide a comprehensive explanation of your evaluation, avoiding any potential bias and ensuring that the order in which the responses were presented does not affect your judgment.'
|
References
Source code in src/distilabel/tasks/preference/judgelm.py
generate_prompt(input, generations, **_)
Generates a prompt following the JudgeLM specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
generations |
List[str]
|
the generations to be used for the prompt. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.preference import JudgeLMTask
>>> task = JudgeLMTask(system_prompt="You are a helpful assistant.")
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
Prompt(
system_prompt="You are a helpful assistant.",
formatted_prompt="[Question] What are the first 5 Fibonacci numbers? ...",
)
Source code in src/distilabel/tasks/preference/judgelm.py
parse_output(output)
Parses the output of the model into the desired format.
Source code in src/distilabel/tasks/preference/judgelm.py
PrometheusTask
dataclass
Bases: CritiqueTask
A CritiqueTask
following the prompt templated used by Prometheus.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. Defaults to |
'You are a fair evaluator language model.'
|
scoring_criteria |
str
|
the scoring criteria to be used for the task, that defines
the scores below, provided via |
required |
score_descriptions |
Dict[int, str]
|
the descriptions of the scores, where the key is the rating value (ideally those should be consecutive), and the value is the description of each rating. |
required |
Disclaimer
Since the Prometheus model has been trained with OpenAI API generated data, the prompting strategy may just be consistent / compliant with either GPT-3.5 or GPT-4 from OpenAI API, or with their own model. Any other model may fail on the generation of a structured output, as well as providing an incorrect / inaccurate critique.
References
Source code in src/distilabel/tasks/critique/prometheus.py
generate_prompt(input, generations, ref_completion, **_)
Generates a prompt following the Prometheus specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
generations |
List[str]
|
the generations to be used for the prompt, in this case, the ones to be critiqued. |
required |
ref_completion |
str
|
the reference completion to be used for the prompt, which is the reference one, assuming the one with the highest score. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.critique import PrometheusTask
>>> task = PrometheusTask(
... scoring_criteria="Overall quality of the responses provided.",
... score_descriptions={0: "false", 1: "partially false", 2: "average", 3: "partially true", 4: "true"},
... )
>>> task.generate_prompt(
... input="What are the first 5 Fibonacci numbers?",
... generations=["0 1 1 2 3", "0 1 1 2 3"],
... ref_completion="0 1 1 2 3",
... )
Prompt(
system_prompt="You are a fair evaluator language model.",
formatted_prompt=""###Task Description:...",
)
Source code in src/distilabel/tasks/critique/prometheus.py
parse_output(output)
Parses the output of the model into the desired format.
Source code in src/distilabel/tasks/critique/prometheus.py
Prompt
dataclass
A dataclass
representing a Prompt
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt. |
required |
formatted_prompt |
str
|
the formatted prompt. |
required |
Examples:
>>> from distilabel.tasks.prompt import Prompt
>>> prompt = Prompt(
... system_prompt="You are a helpful assistant.",
... formatted_prompt="What are the first 5 Fibonacci numbers?",
... )
Source code in src/distilabel/tasks/prompt.py
format_as(format)
Formats the prompt as the specified format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
format |
SupportedFormats
|
the format to be used for the prompt. Available formats are
|
required |
Returns:
Type | Description |
---|---|
Union[str, List[ChatCompletion]]
|
Union[str, List[ChatCompletion]]: the formatted prompt. |
Raises:
Type | Description |
---|---|
ValueError
|
if the specified format is not supported. |
Examples:
>>> from distilabel.tasks.prompt import Prompt
>>> prompt = Prompt(
... system_prompt="You are a helpful assistant.",
... formatted_prompt="What are the first 5 Fibonacci numbers?",
... )
>>> prompt.format_as("default")
'You are a helpful assistant. What are the first 5 Fibonacci numbers?'
Source code in src/distilabel/tasks/prompt.py
SelfInstructTask
dataclass
Bases: TextGenerationTask
A TextGenerationTask
following the Self-Instruct specification for building
the prompts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used. Defaults to |
'You are an expert prompt writer, writing the best and most diverse prompts for a variety of tasks. You are given a task description and a set of instructions for how to write the prompts for an specific AI application.'
|
principles |
Dict[str, List[str]]
|
the principles to be used for the system prompt.
Defaults to |
field(default_factory=lambda : {'harmlessness': harmlessness, 'helpfulness': helpfulness, 'truthfulness': truthfulness, 'honesty': honesty, 'verbalized_calibration': verbalized_calibration}, repr=False)
|
principles_distribution |
Union[Dict[str, float], Literal[balanced], None]
|
the
distribution of principles to be used for the system prompt. Defaults to |
None
|
application_description |
str
|
the description of the AI application. Defaults to "AI assistant". |
'AI assistant'
|
num_instructions |
int
|
the number of instructions to be used for the prompt. Defaults to 5. |
5
|
References
Source code in src/distilabel/tasks/text_generation/self_instruct.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 |
|
generate_prompt(input, **_)
Generates a prompt following the Self-Instruct specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.text_generation import SelfInstructTask
>>> task = SelfInstructTask(system_prompt="You are a helpful assistant.", num_instructions=2)
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?")
Prompt(
system_prompt="You are a helpful assistant.",
formatted_prompt="# Task Description ...",
)
Source code in src/distilabel/tasks/text_generation/self_instruct.py
parse_output(output)
Parses the output of the model into the desired format.
to_argilla_record(dataset_row, instructions_column='instructions')
Converts a dataset row to a list of Argilla FeedbackRecord
s.
Source code in src/distilabel/tasks/text_generation/self_instruct.py
Task
Bases: ABC
Abstract class used to define the methods required to create a Task
, to be used
within an LLM
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. |
required |
task_description |
Union[str, None]
|
the description of the task. Defaults to |
required |
Raises:
Type | Description |
---|---|
ValueError
|
if the |
Source code in src/distilabel/tasks/base.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
|
validate_dataset(columns_in_dataset)
Validates that the dataset contains the required columns for the task.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns_in_dataset |
List[str]
|
the columns in the dataset. |
required |
Raises:
Type | Description |
---|---|
KeyError
|
if the dataset does not contain the required columns. |
Source code in src/distilabel/tasks/base.py
TextGenerationTask
dataclass
Bases: Task
A base Task
definition for text generation using LLMs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used. Defaults to |
"You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."
|
principles |
Dict[str, List[str]]
|
the principles to be used for the system prompt.
Defaults to |
field(default_factory=lambda : {'harmlessness': harmlessness, 'helpfulness': helpfulness, 'truthfulness': truthfulness, 'honesty': honesty, 'verbalized_calibration': verbalized_calibration}, repr=False)
|
principles_distribution |
Union[Dict[str, float], Literal['balanced'], None]
|
the
distribution of principles to be used for the system prompt. Defaults to |
None
|
Examples:
Source code in src/distilabel/tasks/text_generation/base.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 |
|
input_args_names: List[str]
property
Returns the input args names for the task.
output_args_names: List[str]
property
Returns the output args names for the task.
__post_init__()
Validates the principles_distribution
if it is a dict.
Raises:
Type | Description |
---|---|
ValueError
|
if the |
ValueError
|
if the |
Source code in src/distilabel/tasks/text_generation/base.py
generate_prompt(input, **_)
Generates the prompt to be used for generation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for generation. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.text_generation import TextGenerationTask
>>> task = TextGenerationTask(system_prompt="You are a helpful assistant.")
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?")
Prompt(system_prompt='You are a helpful assistant.', formatted_prompt='What are the first 5 Fibonacci numbers?')
Source code in src/distilabel/tasks/text_generation/base.py
parse_output(output)
to_argilla_record(dataset_row)
Converts a dataset row to an Argilla FeedbackRecord
.
Source code in src/distilabel/tasks/text_generation/base.py
UltraCMTask
dataclass
Bases: CritiqueTask
A CritiqueTask
following the prompt templated used by UltraCM (from UltraFeedback).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. Defaults to |
"User: A one-turn chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, very detailed, and polite answers to the user's questions.</s>"
|
Disclaimer
Since the UltraCM model has been trained with OpenAI API generated data, the prompting strategy may just be consistent / compliant with either GPT-3.5 or GPT-4 from OpenAI API, or with their own model. Any other model may fail on the generation of a structured output, as well as providing an incorrect / inaccurate critique.
References
Source code in src/distilabel/tasks/critique/ultracm.py
generate_prompt(input, generations, **_)
Generates a prompt following the UltraCM specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
generations |
List[str]
|
the generations to be used for the prompt, in this case, the ones to be critiqued. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.critique import UltraCMTask
>>> task = UltraCMTask()
>>> task.generate_prompt(
... input="What are the first 5 Fibonacci numbers?",
... generations=["0 1 1 2 3", "0 1 1 2 3"],
... )
Prompt(
system_prompt="User: A one-turn chat between a curious user ...",
formatted_prompt="User: Given my answer to an instruction, your role ...",
)
Source code in src/distilabel/tasks/critique/ultracm.py
parse_output(output)
Parses the output of the model into the desired format.
Source code in src/distilabel/tasks/critique/ultracm.py
UltraFeedbackTask
dataclass
Bases: PreferenceTask
A PreferenceTask
following the prompt template used by ULTRAFEEDBACK.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. Defaults to |
'Your role is to evaluate text quality based on given criteria.'
|
task_description |
Union[str, None]
|
the description of the task. Defaults to |
required |
ratings |
Union[List[Rating], None]
|
the ratings to be used for the task. Defaults to |
required |
References
Source code in src/distilabel/tasks/preference/ultrafeedback.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 |
|
for_overall_quality(system_prompt=None, task_description=None, ratings=None)
classmethod
Classmethod for the UltraFeedbackTask
subtask defined by Argilla, in order to
evaluate all the criterias originally defined in UltraFeedback at once, in a single
subtask.
Source code in src/distilabel/tasks/preference/ultrafeedback.py
generate_prompt(input, generations, **_)
Generates a prompt following the ULTRAFEEDBACK specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
generations |
List[str]
|
the generations to be used for the prompt. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.preference import UltraFeedbackTask
>>> task = UltraFeedbackTask.for_overall_quality()
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
Prompt(
system_prompt="Your role is to evaluate text quality based on given criteria.",
formatted_prompt="# General Text Quality Assessment...",
)
Source code in src/distilabel/tasks/preference/ultrafeedback.py
parse_output(output)
Parses the output of the model into the desired format.
Source code in src/distilabel/tasks/preference/ultrafeedback.py
UltraJudgeTask
dataclass
Bases: PreferenceTask
A PreferenceTask
for the UltraJudge task. The UltraJudge
task has been defined
at Argilla specifically for a better evaluation using AI Feedback. The task is defined
based on both UltraFeedback and JudgeLM, but with several improvements / modifications.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. Defaults to |
"You are an evaluator tasked with assessing AI assistants' responses from the perspective of typical user preferences. Your critical analysis should focus on human-like engagement, solution effectiveness, accuracy, clarity, and creativity. Approach each response as if you were the user, considering how well the response meets your needs and expectations in a real-world scenario. Provide detailed feedback that highlights strengths and areas for improvement in each response, keeping in mind the goal of simulating a human's preferred choice. Your evaluation should be impartial and thorough, reflecting a human's perspective in preferring responses that are practical, clear, authentic, and aligned with their intent. Avoid bias, and focus on the content and quality of the responses."
|
task_description |
Union[str, None]
|
the description of the task. Defaults to |
"Your task is to rigorously evaluate the performance of {num_responses} AI assistants, simulating a human's perspective. You will assess each response based on four key domains, reflecting aspects that are typically valued by humans: {areas}. First provide a score between 0 and 10 and write a detailed feedback for each area and assistant. Finally, provide a list of {num_responses} scores, each separated by a space, to reflect the performance of Assistants 1 to {num_responses}."
|
areas |
List[str]
|
the areas to be used for the task. Defaults to a list of four areas: "Practical Accuracy", "Clarity & Transparency", "Authenticity & Reliability", and "Compliance with Intent". |
field(default_factory=lambda : ['Practical Accuracy', 'Clarity & Transparency', 'Authenticity & Reliability', 'Compliance with Intent'])
|
References
Source code in src/distilabel/tasks/preference/ultrajudge.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
|
areas_str: str
property
Returns a string representation of the areas.
extract_area_score_and_rationale_regex: str
property
Returns a regex to extract the area, score, and rationale from the output.
extract_final_scores_regex: str
property
Returns a regex to extract the final scores from the output.
output_args_names: List[str]
property
Returns the names of the output arguments of the task.
generate_prompt(input, generations, **_)
Generates a prompt following the UltraJudge specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
generations |
List[str]
|
the generations to be used for the prompt. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.preference import UltraJudgeTask
>>> task = UltraJudgeTask(system_prompt="You are a helpful assistant.")
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
Prompt(
system_prompt="You are a helpful assistant.",
formatted_prompt="Your task is to rigorously evaluate the performance of ...",
)
Source code in src/distilabel/tasks/preference/ultrajudge.py
parse_output(output)
Parses the output of the model into the desired format.