Skip to content

Tasks Gallery

Category Overview

The gallery page showcases the different types of components within distilabel.

Icon Category Description
text-generation Text generation steps are used to generate text based on a given prompt.
chat-generation Chat generation steps are used to generate text based on a conversation.
text-classification Text classification steps are used to classify text into a category.
text-manipulation Text manipulation steps are used to manipulate or rewrite an input text.
evol Evol steps are used to rewrite input text and evolve it to a higher quality.
critique Critique steps are used to provide feedback on the quality of the data with a written explanation.
scorer Scorer steps are used to evaluate and score the data with a numerical value.
preference Preference steps are used to collect preferences on the data with numerical values or ranks.
embedding Embedding steps are used to generate embeddings for the data.
clustering Clustering steps are used to group similar data points together.
columns Columns steps are used to manipulate columns in the data.
filtering Filtering steps are used to filter the data based on some criteria.
format Format steps are used to format the data.
load Load steps are used to load the data.
save Save steps are used to save the data.
  • Genstruct


    Generate a pair of instruction-response from a document using an LLM.

    Genstruct

  • Magpie


    Generates conversations using an instruct fine-tuned LLM.

    Magpie

  • SelfInstruct


    Generate instructions based on a given input using an LLM.

    SelfInstruct

  • TextGeneration


    Text generation with an LLM given a prompt.

    TextGeneration

  • URIAL


    Generates a response using a non-instruct fine-tuned model.

    URIAL

  • MagpieGenerator


    Generator task the generates instructions or conversations using Magpie.

    MagpieGenerator

  • ChatGeneration


    Generates text based on a conversation.

    ChatGeneration

  • TextClassification


    Classifies text into one or more categories or labels.

    TextClassification

  • EvolInstruct


    Evolve instructions using an LLM.

    EvolInstruct

  • EvolComplexity


    Evolve instructions to make them more complex using an LLM.

    EvolComplexity

  • EvolQuality


    Evolve the quality of the responses using an LLM.

    EvolQuality

  • EvolInstructGenerator


    Generate evolved instructions using an LLM.

    EvolInstructGenerator

  • EvolComplexityGenerator


    Generate evolved instructions with increased complexity using an LLM.

    EvolComplexityGenerator

  • InstructionBacktranslation


    Self-Alignment with Instruction Backtranslation.

    InstructionBacktranslation

  • PrometheusEval


    Critique and rank the quality of generations from an LLM using Prometheus 2.0.

    PrometheusEval

  • ComplexityScorer


    Score instructions based on their complexity using an LLM.

    ComplexityScorer

  • QualityScorer


    Score responses based on their quality using an LLM.

    QualityScorer

  • UltraFeedback


    Rank generations focusing on different aspects using an LLM.

    UltraFeedback

  • PairRM


    Rank the candidates based on the input using the LLM model.

    PairRM

  • GenerateSentencePair


    Generate a positive and negative (optionally) sentences given an anchor sentence.

    GenerateSentencePair

  • GenerateEmbeddings


    Generate embeddings using the last hidden state of an LLM.

    GenerateEmbeddings

  • TextClustering


    Task that clusters a set of texts and generates summary labels for each cluster.

    TextClustering

  • TextClustering


    Task that clusters a set of texts and generates summary labels for each cluster.

    TextClustering

  • GenerateTextRetrievalData


    Generate text retrieval data with an LLM to later on train an embedding model.

    GenerateTextRetrievalData

  • GenerateShortTextMatchingData


    Generate short text matching data with an LLM to later on train an embedding model.

    GenerateShortTextMatchingData

  • GenerateLongTextMatchingData


    Generate long text matching data with an LLM to later on train an embedding model.

    GenerateLongTextMatchingData

  • GenerateTextClassificationData


    Generate text classification data with an LLM to later on train an embedding model.

    GenerateTextClassificationData

  • StructuredGeneration


    Generate structured content for a given instruction using an LLM.

    StructuredGeneration

  • MonolingualTripletGenerator


    Generate monolingual triplets with an LLM to later on train an embedding model.

    MonolingualTripletGenerator

  • BitextRetrievalGenerator


    Generate bitext retrieval data with an LLM to later on train an embedding model.

    BitextRetrievalGenerator

  • EmbeddingTaskGenerator


    Generate task descriptions for embedding-related tasks using an LLM.

    EmbeddingTaskGenerator