Skip to content

Pipeline Samples

  • Tutorials provide detailed step-by-step explanations and the code used for end-to-end workflows.
  • Paper implementations provide reproductions of fundamental papers in the synthetic data domain.
  • Examples don't provide explenations but simply show code for different tasks.

Tutorials

  • Retrieval and reranking models


    Learn about synthetic data generation for fine-tuning custom retrieval and reranking models.

    Tutorial

Paper Implementations

  • DEITA


    Learn about prompt, response tuning for complexity and quality and LLMs as judges for automatic data selection.

    Paper

  • Instruction Backtranslation


    Learn about automatically labeling human-written text with corresponding instructions.

    Paper

  • Prometheus 2


    Learn about using open-source models as judges for direct assessment and pair-wise ranking.

    Paper

  • UltraFeedback


    Learn about a large-scale, fine-grained, diverse preference dataset, used for training powerful reward and critic models.

    Paper

Examples

  • Benchmarking with distilabel


    Learn about reproducing the Arena Hard benchmark with disitlabel.

    Example

  • llama.cpp with outlines


    Learn about generating RPG characters following a pydantic.BaseModel with outlines in distilabel.

    Example

  • MistralAI with instructor


    Learn about answering instructions with knowledge graphs defined as pydantic.BaseModel objects using instructor in distilabel.

    Example