Self instruct
SelfInstruct
¶
Bases: Task
SelfInstruct is a pre-defined task that, given a number of instructions, a certain criteria for query generations, an application description, and an input, generates a number of instruction related to the given input and following what is stated in the criteria for query generation and the application description. It is based in the SelfInstruct framework from the paper "Self-Instruct: Aligning Language Models with Self-Generated Instructions".
Attributes:
Name | Type | Description |
---|---|---|
num_instructions |
int
|
The number of instructions to be generated. Defaults to 5. |
criteria_for_query_generation |
str
|
The criteria for the query generation. Defaults to the criteria defined within the paper. |
application_description |
str
|
The description of the AI application that one want
to build with these instructions. Defaults to |
Input columns
- input (
str
): The input to generate the instructions. It's also called seed in the paper.
Output columns
- instructions (
List[str]
): The generated instructions.
Source code in src/distilabel/steps/tasks/self_instruct.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
inputs: List[str]
property
¶
The input for the task is the input
i.e. seed text.
outputs
property
¶
The output for the task is a list of instructions
containing the generated instructions.
format_input(input)
¶
The input is formatted as a ChatType
assuming that the instruction
is the first interaction from the user within a conversation.
Source code in src/distilabel/steps/tasks/self_instruct.py
format_output(output, input=None)
¶
The output is formatted as a list with the generated instructions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output |
Union[str, None]
|
the raw output of the LLM. |
required |
input |
Optional[Dict[str, Any]]
|
the input to the task. Used for obtaining the number of responses. |
None
|
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
A dict with containing the generated instructions. |
Source code in src/distilabel/steps/tasks/self_instruct.py
load()
¶
Loads the Jinja2 template for SelfInstruct.