distilabel¶
AI Feedback (AIF) framework to build datasets with and for LLMs:
- Integrations with the most popular libraries and APIs for LLMs: HF Transformers, OpenAI, vLLM, etc.
- Multiple tasks for Self-Instruct, Preference datasets and more.
- Dataset export to Argilla for easy data exploration and further annotation.
Installation¶
Requires Python 3.8+In addition, the following extras are available:
hf-transformers: for using models available in transformers package via theTransformersLLMintegration.hf-inference-endpoints: for using the HuggingFace Inference Endpoints via theInferenceEndpointsLLMintegration.openai: for using OpenAI API models via theOpenAILLMintegration.vllm: for using vllm serving engine via thevLLMintegration.llama-cpp: for using llama-cpp-python as Python bindings forllama.cpp.together: for using Together Inference via their Python client.argilla: for exporting the generated datasets to Argilla.
Quick example¶
from datasets import load_dataset
from distilabel.llm import OpenAILLM
from distilabel.pipeline import pipeline
from distilabel.tasks import TextGenerationTask
dataset = (
load_dataset("HuggingFaceH4/instruction-dataset", split="test[:10]")
.remove_columns(["completion", "meta"])
.rename_column("prompt", "input")
)
task = TextGenerationTask() # (1)
generator = OpenAILLM(task=task, max_new_tokens=512) # (2)
pipeline = pipeline("preference", "instruction-following", generator=generator) # (3)
dataset = pipeline.generate(dataset)
- Create a
Taskfor generating text given an instruction. - Create a
LLMfor generating text using theTaskcreated in the first step. As theLLMwill generate text, it will be agenerator. - Create a pre-defined
Pipelineusing thepipelinefunction and thegeneratorcreated in step 2. Thepipelinefunction will create alabellerLLM usingOpenAILLMwith theUltraFeedbacktask for instruction following assessment.
Note
To run the script successfully, ensure you have assigned your OpenAI API key to the OPENAI_API_KEY environment variable.
For a more complete example, check out our awesome notebook on Google Colab:
Navigation¶
-
Understand the components and their interactions.
-
Technical description of the classes and functions.