Offline Batch Generation
The offline batch generation is a feature that some LLMs implemented in distilabel offers, allowing to send the inputs to a LLM-as-a-service platform and waiting for the outputs in a asynchronous manner. LLM-as-a-service platforms offer this feature as it allows them to gather many inputs and creating batches as big as the hardware allows, maximizing the hardware utilization and reducing the cost of the service. In exchange, the user has to wait certain time for the outputs to be ready but the cost per token is usually much lower.
distilabel pipelines are able to handle LLMs that offer this feature in the following way:
- The first time the pipeline gets executed, the
LLMwill send the inputs to the platform. The platform will return jobs ids that can be used later to check the status of the jobs and retrieve the results. TheLLMwill save these jobs ids in itsjobs_idsattribute and raise an special exception DistilabelOfflineBatchGenerationNotFinishedException that will be handled by thePipeline. The jobs ids will be saved in the pipeline cache, so they can be used in subsequent calls. - The second time and subsequent calls will recover the pipeline execution and the
LLMwon't send the inputs again to the platform. This time as it has thejobs_idsit will check if the jobs have finished, and if they have then it will retrieve the results and return the outputs. If they haven't finished, then it will raise againDistilabelOfflineBatchGenerationNotFinishedExceptionagain. - In addition, LLMs with offline batch generation can be specified to do polling until the jobs have finished, blocking the pipeline until they are done. If for some reason the polling needs to be stopped, one can press Ctrl+C or Cmd+C depending on your OS (or send a
SIGINTto the main process) which will stop the polling and raiseDistilabelOfflineBatchGenerationNotFinishedExceptionthat will be handled by the pipeline as described above.
Warning
In order to recover the pipeline execution and retrieve the results, the pipeline cache must be enabled. If the pipeline cache is disabled, then it will send the inputs again and create different jobs incurring in extra costs.
Example pipeline using OpenAILLM with offline batch generation¶
from distilabel.models import OpenAILLM
from distilabel.pipeline import Pipeline
from distilabel.steps import LoadDataFromHub
from distilabel.steps.tasks import TextGeneration
with Pipeline() as pipeline:
load_data = LoadDataFromHub(output_mappings={"prompt": "instruction"})
text_generation = TextGeneration(
llm=OpenAILLM(
model="gpt-3.5-turbo",
use_offline_batch_generation=True, # (1)
)
)
load_data >> text_generation
if __name__ == "__main__":
distiset = pipeline.run(
parameters={
load_data.name: {
"repo_id": "distilabel-internal-testing/instruction-dataset",
"split": "test",
"batch_size": 500,
},
}
)
- Indicate that the
OpenAILLMshould use offline batch generation.