How-to guides¶
Welcome to the how-to guides section! Here you will find a collection of guides that will help you get started with Distilabel. We have divided the guides into two categories: basic and advanced. The basic guides will help you get started with the core concepts of Distilabel, while the advanced guides will help you explore more advanced features.
Basic¶
- 
Define Steps for your Pipeline 
 Steps are the building blocks of your pipeline. They can be used to generate data, evaluate models, manipulate data, or any other general task. 
- 
Define Tasks that rely on LLMs 
 Tasks are a specific type of step that rely on Language Models (LLMs) to generate data. 
- 
Define LLMs as local or remote models 
 LLMs are the core of your tasks. They are used to integrate with local models or remote APIs. 
- 
Execute Steps and Tasks in a Pipeline 
 Pipeline is where you put all your steps and tasks together to create a workflow. 
Advanced¶
- 
Using the Distiset dataset object 
 Distiset is a dataset object based on the datasets library that can be used to store and manipulate data. 
- 
Export data to Argilla 
 Argilla is a platform that can be used to store, search, and apply feedback to datasets. Argilla 
- 
Using a file system to pass data of batches between steps 
 File system can be used to pass data between steps in a pipeline. 
- 
Using CLI to explore and re-run existing Pipelines 
 CLI can be used to explore and re-run existing pipelines through the command line. 
- 
Cache and recover pipeline executions 
 Caching can be used to recover pipeline executions to avoid loosing data and precious LLM calls. 
- 
Structured data generation 
 Structured data generation can be used to generate data with a specific structure like JSON, function calls, etc. 
- 
Serving an LLM for sharing it between several tasks 
 Serve an LLM via TGI or vLLM to make requests and connect using a client like InferenceEndpointsLLMorOpenAILLMto avoid wasting resources.
- 
Impose requirements to your pipelines and steps 
 Add requirements to steps in a pipeline to ensure they are installed and avoid errors.