Contrastive Learning From AI Revisions (CLAIR)¶
"Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment" introduces both Contrastive
Learning from AI Revisions (CLAIR), a data-creation method which leads to more contrastive preference pairs, and Anchored Preference Optimization (APO), a controllable and more stable alignment objective. While APO can be found in TRL, we have implemented a task for CLAIR in distilabel
.
CLAIR is a method for creating preference pairs which minimally revises one output to express a preference, resulting in a more precise learning signal as opposed to conventional methods which use a judge to select a preferred response.
The athors from the original paper shared a collection of datasets from CLAIR and APO, where ContextualAI/ultrafeedback_clair_32k corresponds to the CLAIR implementation.
Replication¶
Note
The section is named Replication
but in this case we are showing how to use the CLAIR
task create revisions for your generations using distilabel
.
To showcase CLAIR we will be using the CLAIR
task implemented in distilabel
and we are reusing a small sample of the already generated dataset by ContextualAI ContextualAI/ultrafeedback_clair_32k
for testing.
Installation¶
To reproduce the code below, one will need to install distilabel
as follows:
Depending on the LLM provider you want to use, the requirements may vary, take a look at the dependencies in that case, we are using for the example the free inference endpoints from Hugging Face, but that won't apply for a bigger dataset.
Building blocks¶
In this case where we already have instructions and their generations, we will just need to load the data and the corresponding CLAIR task for the revisions:
CLAIR
to generate the revisions.
Code¶
Let's see the full pipeline applied to ContextualAI/ultrafeedback_clair_32k
in distilabel
:
from typing import Any, Dict
from datasets import load_dataset
from distilabel.pipeline import Pipeline
from distilabel.steps.tasks import CLAIR
from distilabel.models import InferenceEndpointsLLM
def transform_ultrafeedback(example: Dict[str, Any]) -> Dict[str, Any]:
return {
"task": example["prompt"],
"student_solution": example["rejected"][1]["content"],
}
dataset = (
load_dataset("ContextualAI/ultrafeedback_clair_32k", split="train")
.select(range(10)) # We collect just 10 examples
.map(transform_ultrafeedback) # Apply the transformation to get just the text
)
with Pipeline(name="CLAIR UltraFeedback sample") as pipeline:
clair = CLAIR( # (1)
llm=InferenceEndpointsLLM(
model_id="meta-llama/Meta-Llama-3.1-70B-Instruct",
generation_kwargs={
"temperature": 0.7,
"max_new_tokens": 4096
}
)
)
if __name__ == "__main__":
distiset = pipeline.run(dataset=dataset) # (2)
distiset.push_to_hub(repo_id="username/clair-test", include_script=True) # (3)
-
This Pipeline uses just CLAIR because we already have the generations, but one can just include a first task to create generations from instructions, and then the revisions with CLAIR.
-
Include the dataset directly in the run method for simplicity.
-
Push the distiset to the hub with the script for reproducibility.
An example dataset can be found at: distilabel-internal-testing/clair-test.