Skip to content

Steps Gallery

  • DeitaFiltering


    Filter dataset rows using DEITA filtering strategy.

    DeitaFiltering

  • FaissNearestNeighbour


    Create a faiss index to get the nearest neighbours.

    FaissNearestNeighbour

  • EmbeddingDedup


    Deduplicates text using embeddings.

    EmbeddingDedup

  • PushToHub


    Push data to a Hugging Face Hub dataset.

    PushToHub

  • PreferenceToArgilla


    Creates a preference dataset in Argilla.

    PreferenceToArgilla

  • TextGenerationToArgilla


    Creates a text generation dataset in Argilla.

    TextGenerationToArgilla

  • CombineOutputs


    Combine the outputs of several upstream steps.

    CombineOutputs

  • ExpandColumns


    Expand columns that contain lists into multiple rows.

    ExpandColumns

  • GroupColumns


    Combines columns from a list of StepInput.

    GroupColumns

  • CombineColumns


    CombineColumns is deprecated and will be removed in version 1.5.0, use GroupColumns instead.

    CombineColumns

  • KeepColumns


    Keeps selected columns in the dataset.

    KeepColumns

  • MergeColumns


    Merge columns from a row.

    MergeColumns

  • EmbeddingGeneration


    Generate embeddings using an Embeddings model.

    EmbeddingGeneration

  • MinHashDedup


    Deduplicates text using MinHash and MinHashLSH.

    MinHashDedup

  • ConversationTemplate


    Generate a conversation template from an instruction and a response.

    ConversationTemplate

  • FormatTextGenerationDPO


    Format the output of your LLMs for Direct Preference Optimization (DPO).

    FormatTextGenerationDPO

  • FormatChatGenerationDPO


    Format the output of a combination of a ChatGeneration + a preference task for Direct Preference Optimization (DPO).

    FormatChatGenerationDPO

  • FormatTextGenerationSFT


    Format the output of a TextGeneration task for Supervised Fine-Tuning (SFT).

    FormatTextGenerationSFT

  • FormatChatGenerationSFT


    Format the output of a ChatGeneration task for Supervised Fine-Tuning (SFT).

    FormatChatGenerationSFT

  • RewardModelScore


    Assign a score to a response using a Reward Model.

    RewardModelScore

  • TruncateTextColumn


    Truncate a row using a tokenizer or the number of characters.

    TruncateTextColumn

  • LoadDataFromDicts


    Loads a dataset from a list of dictionaries.

    LoadDataFromDicts

  • LoadDataFromHub


    Loads a dataset from the Hugging Face Hub.

    LoadDataFromHub

  • LoadDataFromFileSystem


    Loads a dataset from a file in your filesystem.

    LoadDataFromFileSystem

  • LoadDataFromDisk


    Load a dataset that was previously saved to disk.

    LoadDataFromDisk