FormatTextGenerationDPO¶
Format the output of your LLMs for Direct Preference Optimization (DPO).
FormatTextGenerationDPO
is a Step
that formats the output of the combination of a TextGeneration
task with a preference Task
i.e. a task generating ratings
, so that those are used to rank the
existing generations and provide the chosen
and rejected
generations based on the ratings
.
Use this step to transform the output of a combination of a TextGeneration
+ a preference task such as
UltraFeedback
following the standard formatting from frameworks such as axolotl
or alignment-handbook
.
Note¶
The generations
column should contain at least two generations, the ratings
column should
contain the same number of ratings as generations.
Input & Output Columns¶
graph TD
subgraph Dataset
subgraph Columns
ICOL0[system_prompt]
ICOL1[instruction]
ICOL2[generations]
ICOL3[generation_models]
ICOL4[ratings]
end
subgraph New columns
OCOL0[prompt]
OCOL1[prompt_id]
OCOL2[chosen]
OCOL3[chosen_model]
OCOL4[chosen_rating]
OCOL5[rejected]
OCOL6[rejected_model]
OCOL7[rejected_rating]
end
end
subgraph FormatTextGenerationDPO
StepInput[Input Columns: system_prompt, instruction, generations, generation_models, ratings]
StepOutput[Output Columns: prompt, prompt_id, chosen, chosen_model, chosen_rating, rejected, rejected_model, rejected_rating]
end
ICOL0 --> StepInput
ICOL1 --> StepInput
ICOL2 --> StepInput
ICOL3 --> StepInput
ICOL4 --> StepInput
StepOutput --> OCOL0
StepOutput --> OCOL1
StepOutput --> OCOL2
StepOutput --> OCOL3
StepOutput --> OCOL4
StepOutput --> OCOL5
StepOutput --> OCOL6
StepOutput --> OCOL7
StepInput --> StepOutput
Inputs¶
-
system_prompt (
str
, optional): The system prompt used within theLLM
to generate thegenerations
, if available. -
instruction (
str
): The instruction used to generate thegenerations
with theLLM
. -
generations (
List[str]
): The generations produced by theLLM
. -
generation_models (
List[str]
, optional): The model names used to generate thegenerations
, only available if themodel_name
from theTextGeneration
task/s is combined into a single column named this way, otherwise, it will be ignored. -
ratings (
List[float]
): The ratings for each of thegenerations
, produced by a preference task such asUltraFeedback
.
Outputs¶
-
prompt (
str
): The instruction used to generate thegenerations
with theLLM
. -
prompt_id (
str
): TheSHA256
hash of theprompt
. -
chosen (
List[Dict[str, str]]
): Thechosen
generation based on theratings
. -
chosen_model (
str
, optional): The model name used to generate thechosen
generation, if thegeneration_models
are available. -
chosen_rating (
float
): The rating of thechosen
generation. -
rejected (
List[Dict[str, str]]
): Therejected
generation based on theratings
. -
rejected_model (
str
, optional): The model name used to generate therejected
generation, if thegeneration_models
are available. -
rejected_rating (
float
): The rating of therejected
generation.