SelfInstruct¶
Generate instructions based on a given input using an LLM
.
SelfInstruct
is a pre-defined task that, given a number of instructions, a
certain criteria for query generations, an application description, and an input,
generates a number of instruction related to the given input and following what
is stated in the criteria for query generation and the application description.
It is based in the SelfInstruct framework from the paper "Self-Instruct: Aligning
Language Models with Self-Generated Instructions".
Attributes¶
-
num_instructions: The number of instructions to be generated. Defaults to 5.
-
criteria_for_query_generation: The criteria for the query generation. Defaults to the criteria defined within the paper.
-
application_description: The description of the AI application that one want to build with these instructions. Defaults to
AI assistant
.
Input & Output Columns¶
Inputs¶
- input (
str
): The input to generate the instructions. It's also called seed in the paper.
Outputs¶
-
instructions (
List[str]
): The generated instructions. -
model_name (
str
): The model name used to generate the instructions.
Examples¶
Generate instructions based on a given input¶
from distilabel.steps.tasks import SelfInstruct
from distilabel.llms.huggingface import InferenceEndpointsLLM
self_instruct = SelfInstruct(
llm=InferenceEndpointsLLM(
model_id="mistralai/Mistral-7B-Instruct-v0.2",
),
num_instructions=5, # This is the default value
)
self_instruct.load()
result = next(self_instruct.process([{"input": "instruction"}]))
# result
# [
# {
# 'input': 'instruction',
# 'model_name': 'mistralai/Mistral-7B-Instruct-v0.2',
# 'instructions': ["instruction 1", "instruction 2", "instruction 3", "instruction 4", "instruction 5"],
# }
# ]