Tasks Gallery¶
Category Overview
The gallery page showcases the different types of components within distilabel
.
Icon | Category | Description |
---|---|---|
text-generation | Text generation steps are used to generate text based on a given prompt. | |
chat-generation | Chat generation steps are used to generate text based on a conversation. | |
text-classification | Text classification steps are used to classify text into a category. | |
text-manipulation | Text manipulation steps are used to manipulate or rewrite an input text. | |
evol | Evol steps are used to rewrite input text and evolve it to a higher quality. | |
critique | Critique steps are used to provide feedback on the quality of the data with a written explanation. | |
scorer | Scorer steps are used to evaluate and score the data with a numerical value. | |
preference | Preference steps are used to collect preferences on the data with numerical values or ranks. | |
embedding | Embedding steps are used to generate embeddings for the data. | |
clustering | Clustering steps are used to group similar data points together. | |
columns | Columns steps are used to manipulate columns in the data. | |
filtering | Filtering steps are used to filter the data based on some criteria. | |
format | Format steps are used to format the data. | |
load | Load steps are used to load the data. | |
execution | Executes python functions. | |
save | Save steps are used to save the data. | |
labelling | Labelling steps are used to label the data. |
-
APIGenGenerator
Generate queries and answers for the given functions in JSON format.
-
Genstruct
Generate a pair of instruction-response from a document using an
LLM
. -
Magpie
Generates conversations using an instruct fine-tuned LLM.
-
MathShepherdCompleter
Math Shepherd Completer and auto-labeller task.
-
MathShepherdGenerator
Math Shepherd solution generator.
-
SelfInstruct
Generate instructions based on a given input using an
LLM
. -
TextGeneration
Text generation with an
LLM
given a prompt. -
TextGenerationWithImage
Text generation with images with an
LLM
given a prompt. -
URIAL
Generates a response using a non-instruct fine-tuned model.
-
MagpieGenerator
Generator task the generates instructions or conversations using Magpie.
-
ChatGeneration
Generates text based on a conversation.
-
ArgillaLabeller
Annotate Argilla records based on input fields, example records and question settings.
-
TextClassification
Classifies text into one or more categories or labels.
-
EvolInstruct
Evolve instructions using an
LLM
. -
EvolComplexity
Evolve instructions to make them more complex using an
LLM
. -
EvolQuality
Evolve the quality of the responses using an
LLM
. -
EvolInstructGenerator
Generate evolved instructions using an
LLM
. -
EvolComplexityGenerator
Generate evolved instructions with increased complexity using an
LLM
. -
InstructionBacktranslation
Self-Alignment with Instruction Backtranslation.
-
PrometheusEval
Critique and rank the quality of generations from an
LLM
using Prometheus 2.0. -
ComplexityScorer
Score instructions based on their complexity using an
LLM
. -
QualityScorer
Score responses based on their quality using an
LLM
. -
CLAIR
Contrastive Learning from AI Revisions (CLAIR).
-
UltraFeedback
Rank generations focusing on different aspects using an
LLM
. -
PairRM
Rank the candidates based on the input using the
LLM
model. -
GenerateSentencePair
Generate a positive and negative (optionally) sentences given an anchor sentence.
-
GenerateEmbeddings
Generate embeddings using the last hidden state of an
LLM
. -
TextClustering
Task that clusters a set of texts and generates summary labels for each cluster.
-
TextClustering
Task that clusters a set of texts and generates summary labels for each cluster.
-
APIGenSemanticChecker
Generate queries and answers for the given functions in JSON format.
-
GenerateTextRetrievalData
Generate text retrieval data with an
LLM
to later on train an embedding model. -
GenerateShortTextMatchingData
Generate short text matching data with an
LLM
to later on train an embedding model. -
GenerateLongTextMatchingData
Generate long text matching data with an
LLM
to later on train an embedding model. -
GenerateTextClassificationData
Generate text classification data with an
LLM
to later on train an embedding model. -
StructuredGeneration
Generate structured content for a given
instruction
using anLLM
. -
MonolingualTripletGenerator
Generate monolingual triplets with an
LLM
to later on train an embedding model. -
BitextRetrievalGenerator
Generate bitext retrieval data with an
LLM
to later on train an embedding model. -
EmbeddingTaskGenerator
Generate task descriptions for embedding-related tasks using an
LLM
.