How-to guides¶

Welcome to the how-to guides section! Here you will find a collection of guides that will help you get started with Distilabel. We have divided the guides into two categories: basic and advanced. The basic guides will help you get started with the core concepts of Distilabel, while the advanced guides will help you explore more advanced features.

Basic¶

Define Steps for your Pipeline

Steps are the building blocks of your pipeline. They can be used to generate data, evaluate models, manipulate data, or any other general task.

Define Steps
Define Tasks that rely on LLMs

Tasks are a specific type of step that rely on Language Models (LLMs) to generate data.

Define Tasks
Define LLMs as local or remote models

LLMs are the core of your tasks. They are used to integrate with local models or remote APIs.

Define LLMs
Execute Steps and Tasks in a Pipeline

Pipeline is where you put all your steps and tasks together to create a workflow.

Execute Pipeline

Advanced¶

Using the Distiset dataset object

Distiset is a dataset object based on the datasets library that can be used to store and manipulate data.

Distiset
Export data to Argilla

Argilla is a platform that can be used to store, search, and apply feedback to datasets. Argilla
Using a file system to pass data of batches between steps

File system can be used to pass data between steps in a pipeline.

File System
Using CLI to explore and re-run existing Pipelines

CLI can be used to explore and re-run existing pipelines through the command line.

CLI
Cache and recover pipeline executions

Caching can be used to recover pipeline executions to avoid loosing data and precious LLM calls.

Caching
Structured data generation

Structured data generation can be used to generate data with a specific structure like JSON, function calls, etc.

Structured Generation