How-to guides¶
Welcome to the how-to guides section! Here you will find a collection of guides that will help you get started with Distilabel. We have divided the guides into two categories: basic and advanced. The basic guides will help you get started with the core concepts of Distilabel, while the advanced guides will help you explore more advanced features.
Basic¶
-
Define Steps for your Pipeline
Steps are the building blocks of your pipeline. They can be used to generate data, evaluate models, manipulate data, or any other general task.
-
Define Tasks that rely on LLMs
Tasks are a specific type of step that rely on Language Models (LLMs) to generate data.
-
Define LLMs as local or remote models
LLMs are the core of your tasks. They are used to integrate with local models or remote APIs.
-
Execute Steps and Tasks in a Pipeline
Pipeline is where you put all your steps and tasks together to create a workflow.
Advanced¶
-
Using the Distiset dataset object
Distiset is a dataset object based on the datasets library that can be used to store and manipulate data.
-
Export data to Argilla
Argilla is a platform that can be used to store, search, and apply feedback to datasets. Argilla
-
Using a file system to pass data of batches between steps
File system can be used to pass data between steps in a pipeline.
-
Using CLI to explore and re-run existing Pipelines
CLI can be used to explore and re-run existing pipelines through the command line.
-
Cache and recover pipeline executions
Caching can be used to recover pipeline executions to avoid loosing data and precious LLM calls.
-
Structured data generation
Structured data generation can be used to generate data with a specific structure like JSON, function calls, etc.