tasks
ComplexityScorerTask
dataclass
¶
Bases: PreferenceTaskNoRationale
A PreferenceTask
following the Complexity Scorer
specification for rating instructions
in terms of complexity.
This task is inspired by the Evol Complexity Scorer in the Deita framework: Deita is an open-sourced project designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs).
The task is defined as follows: Ask an LLM (in the original paper they used ChatGPT) to rate the instructions (the number of instructions is dynamic in the sense that you can compare any number, in Deita the chose 6) to obtain a complexity score c for each instruction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used. Not defined for this task. |
''
|
Source code in src/distilabel/tasks/preference/complexity_scorer.py
input_args_names: List[str]
property
¶
Returns the names of the input arguments of the task.
generate_prompt(generations, **_)
¶
Generates a prompt following the Evol Complexity specification in Deita.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generations |
List[str]
|
the generations to be used for the prompt. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks import ComplexityScorerTask
>>> task = ComplexityScorerTask()
>>> task.generate_prompt(["instruction 1", "instruction 2"])
Prompt(system_prompt="", formatted_prompt="Ranking the following questions...")
Source code in src/distilabel/tasks/preference/complexity_scorer.py
parse_output(output)
¶
Parses the output of the task, returning a list with the rank/score of each instruction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output |
str
|
The output of the LLM raw. |
required |
Returns:
Type | Description |
---|---|
Dict[str, List[str]]
|
Dict[str, List[str]]: A dict with containing the ranks/scores of each instruction. |
Source code in src/distilabel/tasks/preference/complexity_scorer.py
CritiqueTask
dataclass
¶
Bases: RatingToArgillaMixin
, Task
A Task
for critique / judge tasks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. |
required |
task_description |
Union[str, None]
|
the description of the task. Defaults to |
None
|
Source code in src/distilabel/tasks/critique/base.py
EvolComplexityTask
dataclass
¶
Bases: EvolInstructTask
A TextGenerationTask
following the EvolComplexity
specification for building prompts. This is a special case
of the original EvolInstructTask, where the evolution method is fixed to "constraints", "deepen", "concretizing" or "reasoning".
Additionally, an additional elimation step should be executed to screen out instructions that are not useful.
From the reference repository: Evol-Instruct is a novel method using LLMs instead of humans to automatically mass-produce open-domain instructions of various difficulty levels and skills range, to improve the performance of LLMs.
The task is defined as follows: Starting from an initial (simpler) instruction, select in-depth or in-breadth evolving to upgrade the simple instruction to a more complex one or create a new one (to increase diversity). The In-depth Evolving includes the following operations: "constraints", "deepen", "concretizing" or "reasoning". The In-breadth Evolving is mutation, i.e., generating a completely new instruction based on the given instruction.
Given the evolved instructions are generated from LLMs, sometimes the evolving will fail. We adopt an instruction eliminator to filter the failed instructions, called Elimination Evolving, but we don't apply the step of asking again to the LLM it the answer is a copy from the same used prompt.
This evolutionary process can be repeated for several rounds to obtain instruction data containing various complexities. Currently the task is implemented as a single step, so to generate multiple evolutions you can "repeat" the instructions in the original dataset. An example can be seen at the following script: examples/pipeline-evol-instruct-alpaca.py
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used. Not defined for this task. |
''
|
References
Source code in src/distilabel/tasks/text_generation/evol_complexity.py
generate_prompt(input, evolution_method=None, **_)
¶
Generates a prompt following the Evol-Complexity specification of the Deita Paper.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
evolution_method |
str
|
The evolution method to be used. If not provided (the default), a random one is chosen like the original paper. Available ones are "constraints", "deepen", "concretizing" or "reasoning". |
None
|
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.text_generation import EvolComplexityGeneratorTask
>>> task = EvolComplexityGeneratorTask()
>>> task.generate_prompt("Give three tips for staying healthy.")
Prompt(
system_prompt="",
formatted_prompt="I want you to act as a Prompt ...",
)
Source code in src/distilabel/tasks/text_generation/evol_complexity.py
EvolInstructTask
dataclass
¶
Bases: InstructTaskMixin
, TextGenerationTask
A TextGenerationTask
following the EvolInstruct
specification for building the prompts.
From the reference repository: Evol-Instruct is a novel method using LLMs instead of humans to automatically mass-produce open-domain instructions of various difficulty levels and skills range, to improve the performance of LLMs.
The task is defined as follows: Starting from an initial (simpler) instruction, select in-depth or in-breadth evolving to upgrade the simple instruction to a more complex one or create a new one (to increase diversity). The In-depth Evolving includes the following operations: add constraints, deepening, concretizing and increase reasoning. The In-breadth Evolving is mutation, i.e., generating a completely new instruction based on the given instruction.
Given the evolved instructions are generated from LLMs, sometimes the evolving will fail. We adopt an instruction eliminator to filter the failed instructions, called Elimination Evolving, but we don't apply the step of asking again to the LLM it the answer is a copy from the same used prompt.
This evolutionary process can be repeated for several rounds to obtain instruction data containing various complexities. Currently the task is implemented as a single step, so to generate multiple evolutions you can "repeat" the instructions in the original dataset. An example can be seen at the following script: examples/pipeline-evol-instruct-alpaca.py
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used. Not defined for this task. |
''
|
Source code in src/distilabel/tasks/text_generation/evol_instruct.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
|
generate_prompt(input, evolution_method=None, **_)
¶
Generates a prompt following the Evol-Instruct specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
evolution_method |
str
|
The evolution method to be used. If not provided (the default), a random one is chosen like the original paper. Available ones are "breadth", "constraints", "deepen", "concretizing" and "reasoning". |
None
|
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.text_generation import EvolInstructTask
>>> task = EvolInstructTask()
>>> task.generate_prompt("Give three tips for staying healthy.")
Prompt(
system_prompt="",
formatted_prompt="I want you to act as a Prompt ...",
)
Source code in src/distilabel/tasks/text_generation/evol_instruct.py
parse_output(output)
¶
Parses the output of the model into the desired format, applying the elimination step for bad generations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output |
str
|
the output of the model. |
required |
Note
The elimination step is applied to the output, but only steps 2-4 in the paper are implemented.
Refer to point 3.2, Elimination Evolving section in WizardLM: Empowering Large Language Models to Follow Complex Instructions
for more information on the elimination evolving step, and take a look at the _elimination_evolving
method for more information of the implementation.
Source code in src/distilabel/tasks/text_generation/evol_instruct.py
EvolQualityTask
dataclass
¶
Bases: EvolInstructTask
A TextGenerationTask
following the Deita
specification for improving the quality of instructions.
From the reference repository: DEITA (short for Data-Efficient Instruction Tuning for Alignment), a series of models fine-tuned from LLaMA and Mistral models using data samples automatically selected with our proposed approach.
The task is defined as follows: Starting from an initial (simpler) instruction response, select an evolving-method to upgrade the quality of the instruction. The Evolving methods includes the following operations: add "helpfulness", "relevance", "depth", "creativity" and "details".
Given the evolved responses are generated from LLMs, sometimes the evolving will fail. We adopt an responses eliminator to filter the failed instructions, called Elimination Evolving, but we don't apply the step of asking again to the LLM it the answer is a copy from the same used prompt. Note that we slightly modify the elimination evolving step, from the original paper, to allow for filtering of the responses.
This evolutionary process can be repeated for several rounds to obtain instruction data containing various
complexities. Currently the task is implemented as a single step, so to generate multiple evolutions you
can "repeat" the instructions in the original dataset. An example of a similar implementation with
EvolInstruct
can be seen at the following script: examples/pipeline-evol-instruct-alpaca.py
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used. Not defined for this task. |
''
|
References
Source code in src/distilabel/tasks/text_generation/evol_quality.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
generate_prompt(input, generation, evolution_method=None, **_)
¶
Generates a prompt following the Evol-Instruct specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
evolution_method |
str
|
The evolution method to be used. If not provided (the default), a random one is chosen like the original paper. Available ones are "helpfulness", "relevance", "deepen", "creativity" and "details". |
None
|
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.text_generation import EvolQualityGeneratorTask
>>> task = EvolQualityGeneratorTask()
>>> task.generate_prompt("Give three tips for staying healthy.", "1. Eat healthy food. 2. Exercise. 3. Sleep well.")
Prompt(
system_prompt="",
formatted_prompt="I want you to act as a Prompt ...",
)
Source code in src/distilabel/tasks/text_generation/evol_quality.py
parse_output(output)
¶
Parses the output of the model into the desired format, applying the elimination step for bad generations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output |
str
|
the output of the model. |
required |
Note
The elimination step is applied to the output, but only steps 2-4 in the paper are implemented.
Refer to point 3.2, Elimination Evolving section in WizardLM: Empowering Large Language Models to Follow Complex Instructions
for more information on the elimination evolving step, and take a look at the _elimination_evolving
method for more information of the implementation.
Source code in src/distilabel/tasks/text_generation/evol_quality.py
JudgeLMTask
dataclass
¶
Bases: PreferenceTask
A PreferenceTask
following the prompt templated used by JudgeLM.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. Defaults to |
'You are a helpful and precise assistant for checking the quality of the answer.'
|
task_description |
Union[str, None]
|
the description of the task. Defaults to |
'We would like to request your feedback on the performance of {num_responses} AI assistants in response to the user question displayed above.\nPlease rate the helpfulness, relevance, accuracy, level of details of their responses. Each assistant receives an overall score on a scale of 1 to 10, where a higher score indicates better overall performance.\nPlease first output a single line containing only {num_responses} values indicating the scores for Assistants 1 to {num_responses}, respectively. The {num_responses} scores are separated by a space. In the subsequent line, please provide a comprehensive explanation of your evaluation, avoiding any potential bias and ensuring that the order in which the responses were presented does not affect your judgment.'
|
References
Source code in src/distilabel/tasks/preference/judgelm.py
generate_prompt(input, generations, **_)
¶
Generates a prompt following the JudgeLM specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
generations |
List[str]
|
the generations to be used for the prompt. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.preference import JudgeLMTask
>>> task = JudgeLMTask(system_prompt="You are a helpful assistant.")
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
Prompt(
system_prompt="You are a helpful assistant.",
formatted_prompt="[Question] What are the first 5 Fibonacci numbers? ...",
)
Source code in src/distilabel/tasks/preference/judgelm.py
parse_output(output)
¶
Parses the output of the model into the desired format.
Source code in src/distilabel/tasks/preference/judgelm.py
PrometheusTask
dataclass
¶
Bases: CritiqueTask
A CritiqueTask
following the prompt templated used by Prometheus.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. Defaults to |
'You are a fair evaluator language model.'
|
scoring_criteria |
str
|
the scoring criteria to be used for the task, that defines
the scores below, provided via |
required |
score_descriptions |
Dict[int, str]
|
the descriptions of the scores, where the key is the rating value (ideally those should be consecutive), and the value is the description of each rating. |
required |
Disclaimer
Since the Prometheus model has been trained with OpenAI API generated data, the prompting strategy may just be consistent / compliant with either GPT-3.5 or GPT-4 from OpenAI API, or with their own model. Any other model may fail on the generation of a structured output, as well as providing an incorrect / inaccurate critique.
References
Source code in src/distilabel/tasks/critique/prometheus.py
generate_prompt(input, generations, ref_completion, **_)
¶
Generates a prompt following the Prometheus specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
generations |
List[str]
|
the generations to be used for the prompt, in this case, the ones to be critiqued. |
required |
ref_completion |
str
|
the reference completion to be used for the prompt, which is the reference one, assuming the one with the highest score. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.critique import PrometheusTask
>>> task = PrometheusTask(
... scoring_criteria="Overall quality of the responses provided.",
... score_descriptions={0: "false", 1: "partially false", 2: "average", 3: "partially true", 4: "true"},
... )
>>> task.generate_prompt(
... input="What are the first 5 Fibonacci numbers?",
... generations=["0 1 1 2 3", "0 1 1 2 3"],
... ref_completion="0 1 1 2 3",
... )
Prompt(
system_prompt="You are a fair evaluator language model.",
formatted_prompt=""###Task Description:...",
)
Source code in src/distilabel/tasks/critique/prometheus.py
parse_output(output)
¶
Parses the output of the model into the desired format.
Source code in src/distilabel/tasks/critique/prometheus.py
Prompt
dataclass
¶
A dataclass
representing a Prompt
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt. |
required |
formatted_prompt |
str
|
the formatted prompt. |
required |
Examples:
>>> from distilabel.tasks.prompt import Prompt
>>> prompt = Prompt(
... system_prompt="You are a helpful assistant.",
... formatted_prompt="What are the first 5 Fibonacci numbers?",
... )
Source code in src/distilabel/tasks/prompt.py
format_as(format)
¶
Formats the prompt as the specified format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
format |
SupportedFormats
|
the format to be used for the prompt. Available formats are
|
required |
Returns:
Type | Description |
---|---|
Union[str, List[ChatCompletion]]
|
Union[str, List[ChatCompletion]]: the formatted prompt. |
Raises:
Type | Description |
---|---|
ValueError
|
if the specified format is not supported. |
Examples:
>>> from distilabel.tasks.prompt import Prompt
>>> prompt = Prompt(
... system_prompt="You are a helpful assistant.",
... formatted_prompt="What are the first 5 Fibonacci numbers?",
... )
>>> prompt.format_as("default")
'You are a helpful assistant. What are the first 5 Fibonacci numbers?'
Source code in src/distilabel/tasks/prompt.py
QualityScorerTask
dataclass
¶
Bases: PreferenceTaskNoRationale
A PreferenceTask
following the Quality Scorer
specification for rating instructions
in terms of quality.
This task is inspired by the Evol Quality Scorer in the Deita framework: Deita is an open-sourced project designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs).
The task follows the same scheme as the Evol Complexity Scorer, but the instructions are scored in terms of quality, obtaining a quality score q for each instruction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used. Not defined for this task. |
''
|
Source code in src/distilabel/tasks/preference/quality_scorer.py
generate_prompt(input, generations, **_)
¶
Generates a prompt following the Evol Quality specification in Deita.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the instruction for which the model will score the responses. |
required |
generations |
List[str]
|
the generations to be used for the prompt. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.preference import QualityScorerTask
>>> task = QualityScorerTask()
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
Prompt(
system_prompt="",
formatted_prompt="Rank the following responses provided ..."
)
Source code in src/distilabel/tasks/preference/quality_scorer.py
parse_output(output)
¶
Parses the output of the task, returning a list with the rating of each instruction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output |
str
|
The output of the LLM raw. |
required |
Returns:
Type | Description |
---|---|
Dict[str, List[str]]
|
Dict[str, List[str]]: A dict with containing the ratings of each instruction. |
Source code in src/distilabel/tasks/preference/quality_scorer.py
SelfInstructTask
dataclass
¶
Bases: InstructTaskMixin
, TextGenerationTask
A TextGenerationTask
following the Self-Instruct specification for building
the prompts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used. Defaults to |
'You are an expert prompt writer, writing the best and most diverse prompts for a variety of tasks. You are given a task description and a set of instructions for how to write the prompts for an specific AI application.'
|
principles |
Dict[str, List[str]]
|
the principles to be used for the system prompt.
Defaults to |
field(default_factory=lambda : {'harmlessness': harmlessness, 'helpfulness': helpfulness, 'truthfulness': truthfulness, 'honesty': honesty, 'verbalized_calibration': verbalized_calibration}, repr=False)
|
principles_distribution |
Union[Dict[str, float], Literal[balanced], None]
|
the
distribution of principles to be used for the system prompt. Defaults to |
None
|
application_description |
str
|
the description of the AI application. Defaults to "AI assistant". |
'AI assistant'
|
num_instructions |
int
|
the number of instructions to be used for the prompt. Defaults to 5. |
5
|
criteria_for_query_generation |
str
|
the criteria for query generation that we want our model to have. Default value covers default behaviour for SelfInstructTask. This value is passed to the .jinja template, where extra instructions are added to ensure correct output format. |
'Incorporate a diverse range of verbs, avoiding repetition.\nEnsure queries are compatible with AI model\'s text generation functions and are limited to 1-2 sentences.\nDesign queries to be self-contained and standalone.\nBlend interrogative (e.g., "What is the significance of x?") and imperative (e.g., "Detail the process of x.") styles.'
|
References
Source code in src/distilabel/tasks/text_generation/self_instruct.py
generate_prompt(input, **_)
¶
Generates a prompt following the Self-Instruct specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.text_generation import SelfInstructTask
>>> task = SelfInstructTask(system_prompt="You are a helpful assistant.", num_instructions=2)
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?")
Prompt(
system_prompt="You are a helpful assistant.",
formatted_prompt="# Task Description ...",
)
Source code in src/distilabel/tasks/text_generation/self_instruct.py
parse_output(output)
¶
Parses the output of the model into the desired format.
Task
¶
Bases: ABC
, _Serializable
Abstract class used to define the methods required to create a Task
, to be used
within an LLM
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. |
required |
task_description |
Union[str, None]
|
the description of the task. Defaults to |
required |
Raises:
Type | Description |
---|---|
ValueError
|
if the |
Source code in src/distilabel/tasks/base.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
|
validate_dataset(columns_in_dataset)
¶
Validates that the dataset contains the required columns for the task.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns_in_dataset |
List[str]
|
the columns in the dataset. |
required |
Raises:
Type | Description |
---|---|
KeyError
|
if the dataset does not contain the required columns. |
Source code in src/distilabel/tasks/base.py
TextGenerationTask
dataclass
¶
Bases: Task
A base Task
definition for text generation using LLMs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used. Defaults to |
"You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."
|
principles |
Dict[str, List[str]]
|
the principles to be used for the system prompt.
Defaults to |
field(default_factory=lambda : {'harmlessness': harmlessness, 'helpfulness': helpfulness, 'truthfulness': truthfulness, 'honesty': honesty, 'verbalized_calibration': verbalized_calibration}, repr=False)
|
principles_distribution |
Union[Dict[str, float], Literal['balanced'], None]
|
the
distribution of principles to be used for the system prompt. Defaults to |
None
|
Examples:
Source code in src/distilabel/tasks/text_generation/base.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 |
|
input_args_names: List[str]
property
¶
Returns the input args names for the task.
output_args_names: List[str]
property
¶
Returns the output args names for the task.
__post_init__()
¶
Validates the principles_distribution
if it is a dict.
Raises:
Type | Description |
---|---|
ValueError
|
if the |
ValueError
|
if the |
Source code in src/distilabel/tasks/text_generation/base.py
generate_prompt(input, **_)
¶
Generates the prompt to be used for generation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for generation. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.text_generation import TextGenerationTask
>>> task = TextGenerationTask(system_prompt="You are a helpful assistant.")
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?")
Prompt(system_prompt='You are a helpful assistant.', formatted_prompt='What are the first 5 Fibonacci numbers?')
Source code in src/distilabel/tasks/text_generation/base.py
parse_output(output)
¶
to_argilla_record(dataset_row)
¶
Converts a dataset row to an Argilla FeedbackRecord
.
Source code in src/distilabel/tasks/text_generation/base.py
UltraCMTask
dataclass
¶
Bases: CritiqueTask
A CritiqueTask
following the prompt templated used by UltraCM (from UltraFeedback).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. Defaults to |
"User: A one-turn chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, very detailed, and polite answers to the user's questions.</s>"
|
Disclaimer
Since the UltraCM model has been trained with OpenAI API generated data, the prompting strategy may just be consistent / compliant with either GPT-3.5 or GPT-4 from OpenAI API, or with their own model. Any other model may fail on the generation of a structured output, as well as providing an incorrect / inaccurate critique.
References
Source code in src/distilabel/tasks/critique/ultracm.py
generate_prompt(input, generations, **_)
¶
Generates a prompt following the UltraCM specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
generations |
List[str]
|
the generations to be used for the prompt, in this case, the ones to be critiqued. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.critique import UltraCMTask
>>> task = UltraCMTask()
>>> task.generate_prompt(
... input="What are the first 5 Fibonacci numbers?",
... generations=["0 1 1 2 3", "0 1 1 2 3"],
... )
Prompt(
system_prompt="User: A one-turn chat between a curious user ...",
formatted_prompt="User: Given my answer to an instruction, your role ...",
)
Source code in src/distilabel/tasks/critique/ultracm.py
parse_output(output)
¶
Parses the output of the model into the desired format.
Source code in src/distilabel/tasks/critique/ultracm.py
UltraFeedbackTask
dataclass
¶
Bases: PreferenceTask
A PreferenceTask
following the prompt template used by ULTRAFEEDBACK.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. Defaults to |
'Your role is to evaluate text quality based on given criteria.'
|
task_description |
Union[str, None]
|
the description of the task. Defaults to |
required |
ratings |
Union[List[Rating], None]
|
the ratings to be used for the task. Defaults to |
required |
References
Source code in src/distilabel/tasks/preference/ultrafeedback.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 |
|
for_overall_quality(system_prompt=None, task_description=None, ratings=None)
classmethod
¶
Classmethod for the UltraFeedbackTask
subtask defined by Argilla, in order to
evaluate all the criterias originally defined in UltraFeedback at once, in a single
subtask.
Source code in src/distilabel/tasks/preference/ultrafeedback.py
generate_prompt(input, generations, **_)
¶
Generates a prompt following the ULTRAFEEDBACK specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
generations |
List[str]
|
the generations to be used for the prompt. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.preference import UltraFeedbackTask
>>> task = UltraFeedbackTask.for_overall_quality()
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
Prompt(
system_prompt="Your role is to evaluate text quality based on given criteria.",
formatted_prompt="# General Text Quality Assessment...",
)
Source code in src/distilabel/tasks/preference/ultrafeedback.py
parse_output(output)
¶
Parses the output of the model into the desired format.
Source code in src/distilabel/tasks/preference/ultrafeedback.py
UltraJudgeTask
dataclass
¶
Bases: PreferenceTask
A PreferenceTask
for the UltraJudge task. The UltraJudge
task has been defined
at Argilla specifically for a better evaluation using AI Feedback. The task is defined
based on both UltraFeedback and JudgeLM, but with several improvements / modifications.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
system_prompt |
str
|
the system prompt to be used for generation. Defaults to |
"You are an evaluator tasked with assessing AI assistants' responses from the perspective of typical user preferences. Your critical analysis should focus on human-like engagement, solution effectiveness, accuracy, clarity, and creativity. Approach each response as if you were the user, considering how well the response meets your needs and expectations in a real-world scenario. Provide detailed feedback that highlights strengths and areas for improvement in each response, keeping in mind the goal of simulating a human's preferred choice. Your evaluation should be impartial and thorough, reflecting a human's perspective in preferring responses that are practical, clear, authentic, and aligned with their intent. Avoid bias, and focus on the content and quality of the responses."
|
task_description |
Union[str, None]
|
the description of the task. Defaults to |
"Your task is to rigorously evaluate the performance of {num_responses} AI assistants, simulating a human's perspective. You will assess each response based on four key domains, reflecting aspects that are typically valued by humans: {areas}. First provide a score between 0 and 10 and write a detailed feedback for each area and assistant. Finally, provide a list of {num_responses} scores, each separated by a space, to reflect the performance of Assistants 1 to {num_responses}."
|
areas |
List[str]
|
the areas to be used for the task. Defaults to a list of four areas: "Practical Accuracy", "Clarity & Transparency", "Authenticity & Reliability", and "Compliance with Intent". |
field(default_factory=lambda : ['Practical Accuracy', 'Clarity & Transparency', 'Authenticity & Reliability', 'Compliance with Intent'])
|
References
Source code in src/distilabel/tasks/preference/ultrajudge.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
|
areas_str: str
property
¶
Returns a string representation of the areas.
extract_area_score_and_rationale_regex: str
property
¶
Returns a regex to extract the area, score, and rationale from the output.
extract_final_scores_regex: str
property
¶
Returns a regex to extract the final scores from the output.
output_args_names: List[str]
property
¶
Returns the names of the output arguments of the task.
generate_prompt(input, generations, **_)
¶
Generates a prompt following the UltraJudge specification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
str
|
the input to be used for the prompt. |
required |
generations |
List[str]
|
the generations to be used for the prompt. |
required |
Returns:
Name | Type | Description |
---|---|---|
Prompt |
Prompt
|
the generated prompt. |
Examples:
>>> from distilabel.tasks.preference import UltraJudgeTask
>>> task = UltraJudgeTask(system_prompt="You are a helpful assistant.")
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
Prompt(
system_prompt="You are a helpful assistant.",
formatted_prompt="Your task is to rigorously evaluate the performance of ...",
)
Source code in src/distilabel/tasks/preference/ultrajudge.py
parse_output(output)
¶
Parses the output of the model into the desired format.