Skip to content

How-to guides

Welcome to the how-to guides section! Here you will find a collection of guides that will help you get started with Distilabel. We have divided the guides into two categories: basic and advanced. The basic guides will help you get started with the core concepts of Distilabel, while the advanced guides will help you explore more advanced features.


  • Define Steps for your Pipeline

    Steps are the building blocks of your pipeline. They can be used to generate data, evaluate models, manipulate data, or any other general task.

    Define Steps

  • Define Tasks that rely on LLMs

    Tasks are a specific type of step that rely on Language Models (LLMs) to generate data.

    Define Tasks

  • Define LLMs as local or remote models

    LLMs are the core of your tasks. They are used to integrate with local models or remote APIs.

    Define LLMs

  • Execute Steps and Tasks in a Pipeline

    Pipeline is where you put all your steps and tasks together to create a workflow.

    Execute Pipeline


  • Using the Distiset dataset object

    Distiset is a dataset object based on the datasets library that can be used to store and manipulate data.


  • Export data to Argilla

    Argilla is a platform that can be used to store, search, and apply feedback to datasets. Argilla

  • Using a file system to pass data of batches between steps

    File system can be used to pass data between steps in a pipeline.

    File System

  • Using CLI to explore and re-run existing Pipelines

    CLI can be used to explore and re-run existing pipelines through the command line.


  • Cache and recover pipeline executions

    Caching can be used to recover pipeline executions to avoid loosing data and precious LLM calls.


  • Structured data generation

    Structured data generation can be used to generate data with a specific structure like JSON, function calls, etc.

    Structured Generation