Skip to content

How-to guides

Welcome to the how-to guides section! Here you will find a collection of guides that will help you get started with Distilabel. We have divided the guides into two categories: basic and advanced. The basic guides will help you get started with the core concepts of Distilabel, while the advanced guides will help you explore more advanced features.

Basic

  • Define Steps for your Pipeline


    Steps are the building blocks of your pipeline. They can be used to generate data, evaluate models, manipulate data, or any other general task.

    Define Steps

  • Define Tasks that rely on LLMs


    Tasks are a specific type of step that rely on Language Models (LLMs) to generate data.

    Define Tasks

  • Define LLMs as local or remote models


    LLMs are the core of your tasks. They are used to integrate with local models or remote APIs.

    Define LLMs

  • Execute Steps and Tasks in a Pipeline


    Pipeline is where you put all your steps and tasks together to create a workflow.

    Execute Pipeline

Advanced

  • Using the Distiset dataset object


    Distiset is a dataset object based on the datasets library that can be used to store and manipulate data.

    Distiset

  • Export data to Argilla


    Argilla is a platform that can be used to store, search, and apply feedback to datasets. Argilla

  • Using a file system to pass data of batches between steps


    File system can be used to pass data between steps in a pipeline.

    File System

  • Using CLI to explore and re-run existing Pipelines


    CLI can be used to explore and re-run existing pipelines through the command line.

    CLI

  • Cache and recover pipeline executions


    Caching can be used to recover pipeline executions to avoid loosing data and precious LLM calls.

    Caching

  • Structured data generation


    Structured data generation can be used to generate data with a specific structure like JSON, function calls, etc.

    Structured Generation