Skip to content

Generator

EvolComplexityGenerator

Bases: EvolInstructGenerator

EvolComplexity is a task that evolves instructions to make them more complex, and it is based in the EvolInstruct task, but using slight different prompts, but the exact same evolutionary approach.

Attributes:

Name Type Description
num_instructions

The number of instructions to be generated.

generate_answers

Whether to generate answers for the instructions or not. Defaults to False.

mutation_templates Dict[str, str]

The mutation templates to be used for the generation of the instructions.

min_length Dict[str, str]

Defines the length (in bytes) that the generated instruction needs to be higher than, to be considered valid. Defaults to 512.

max_length Dict[str, str]

Defines the length (in bytes) that the generated instruction needs to be lower than, to be considered valid. Defaults to 1024.

seed Dict[str, str]

The seed to be set for numpy in order to randomly pick a mutation method. Defaults to 42.

Runtime parameters
  • min_length: Defines the length (in bytes) that the generated instruction needs to be higher than, to be considered valid.
  • max_length: Defines the length (in bytes) that the generated instruction needs to be lower than, to be considered valid.
  • seed: The number of evolutions to be run.
Input columns
  • instruction (str): The instruction to evolve.
Output columns
  • instruction (str): The evolved instruction.
  • answer (str, optional): The answer to the instruction if generate_answers=True.
  • model_name (str): The name of the LLM used to evolve the instructions.
References
Source code in src/distilabel/steps/tasks/evol_instruct/evol_complexity/generator.py
class EvolComplexityGenerator(EvolInstructGenerator):
    """EvolComplexity is a task that evolves instructions to make them more complex,
    and it is based in the EvolInstruct task, but using slight different prompts, but the
    exact same evolutionary approach.

    Attributes:
        num_instructions: The number of instructions to be generated.
        generate_answers: Whether to generate answers for the instructions or not. Defaults
            to `False`.
        mutation_templates: The mutation templates to be used for the generation of the
            instructions.
        min_length: Defines the length (in bytes) that the generated instruction needs to
            be higher than, to be considered valid. Defaults to `512`.
        max_length: Defines the length (in bytes) that the generated instruction needs to
            be lower than, to be considered valid. Defaults to `1024`.
        seed: The seed to be set for `numpy` in order to randomly pick a mutation method.
            Defaults to `42`.

    Runtime parameters:
        - `min_length`: Defines the length (in bytes) that the generated instruction needs to be higher than, to be considered valid.
        - `max_length`: Defines the length (in bytes) that the generated instruction needs to be lower than, to be considered valid.
        - `seed`: The number of evolutions to be run.

    Input columns:
        - instruction (`str`): The instruction to evolve.

    Output columns:
        - instruction (`str`): The evolved instruction.
        - answer (`str`, optional): The answer to the instruction if `generate_answers=True`.
        - model_name (`str`): The name of the LLM used to evolve the instructions.

    References:
        - [What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning](https://arxiv.org/abs/2312.15685)
        - [WizardLM: Empowering Large Language Models to Follow Complex Instructions](https://arxiv.org/abs/2304.12244)
    """

    mutation_templates: Dict[str, str] = GENERATION_MUTATION_TEMPLATES