LLMs¶
In this section we will see what's an LLM
and the different LLM
s implementations available in distilabel
.
LLM¶
The LLM
class encapsulates the functionality for interacting with a large language model.
It distinguishes between task specifications and configurable parameters that influence the LLM behavior.
For illustration purposes, we employ the TextGenerationTask
in this section and guide you to the dedicated Tasks
section for comprehensive details.
LLM classes share several general parameters and define implementation-specific ones. Let's explain the general parameters first and the generate method, and then the specifics for each class.
General parameters¶
Let's briefly introduce the general parameters we may find1:
-
max_new_tokens
: this parameter controls the maximum number of tokens the LLM is allowed to use. -
temperature
: parameter associated to the creativity of the model, a value close to 0 makes the model more deterministic, while higher values make the model more "creative". -
top_k
andtop_p
:top_k
limits the number of tokens the model is allowed to use to generate the following token sorted by probability, whiletop_p
limits the number of tokens the model can use for the next token, but in terms of the sum of their probabilities. -
frequency_penalty
andpresence_penalty
: the frequency penalty penalizes tokens that have already appeared in the generated text, limiting the possibility of those appearing again, and thepresence_penalty
penalizes regardless of the frequency. -
num_threads
: some LLMs work better when using several threads to generate text, this parameter allows to specify the number of threads to use. -
prompt_format
andprompt_formatting_fn
: these two parameters allow to tweak the prompt of our models, for example we can direct theLLM
to format the prompt according to one of the defined formats, whileprompt_formatting_fn
allows to pass a function that will be applied to the prompt before the generation, for extra control of what we ingest to the model.
Besides the general parameters, some LLM
subclasses also have some implementation specific parameters to control the text generation, but those will be explained in the corresponding section.
generate
method¶
Once you create an LLM
, you use the generate
method to interact with it. This method accepts two parameters:
-
inputs
: which is a list of dictionaries containing the inputs for theLLM
and theTask
. Each dictionary must have all the keys required by theTask
. -
num_generations
: which is an integer used to specify how many text generations we want to obtain for each element ininputs
.
The output of the method will be a list containing lists of LLMOutput
. Each inner list is associated to the corresponding input in inputs
, and each LLMOutput
is associated to one of the num_generations
for each input.
>>> llm.generate(inputs=[...], num_generations=2)
[ # (1)
[ # (2)
{ # (3)
"model_name": "notus-7b-v1",
"prompt_used": "Write a letter for my friend Bob...",
"raw_output": "Dear Bob, ...",
"parsed_output": {
"generations": "Dear Bob, ...",
}
},
{
"model_name": "notus-7b-v1",
"prompt_used": "Write a letter for my friend Bob...",
"raw_output": "Dear Bob, ...",
"parsed_output": {
"generations": "Dear Bob, ...",
}
},
],
[...],
]
- The outer list will contain as many lists as elements in
inputs
. - The inner lists will contain as many
LLMOutput
s as specified innum_generations
. - Each
LLMOutput
is a dictionary
The LLMOutput
is a TypedDict
containing the keys model_name
, prompt_used
, raw_output
and parsed_output
. The parsed_output
key is a dictionary that will contain all the Task
outputs.
{
"model_name": "notus-7b-v1",
"prompt_used": "Write a letter for my friend Bob...",
"raw_output": "Dear Bob, ...",
"parsed_output": { # (1)
"generations": "Dear Bob, ...",
}
},
- The keys contained in
parsed_output
will depend on theTask
used. In this case, we usedTextGenerationTask
, so the keygenerations
is present.
If the LLM
uses a thread pool, then the output of the generate
method will be a Future having as result a list of lists of LLMOutput
as described above.
Validate prompts¶
Before calling the LLM with your dataset we can take a look at the prompts that will be sent to the engine without actually making the call, to check the data is as expected. The following examples show two different LLM
cases, but just take into account the input will have to be in the format expected from the Task
:
Validation
import os
from distilabel.tasks import EvolInstructTask
from distilabel.llm import InferenceEndpointsLLM
task = EvolInstructTask()
llm = InferenceEndpointsLLM(
task=task,
endpoint_name_or_model_id="aws-notus-7b-v1-3184",
endpoint_namespace="argilla",
token=os.getenv("HF_API_TOKEN", None),
prompt_format="notus"
)
print(llm.validate_prompts([{"input": "What's a large language model?"}])[0])
# <|system|>
# </s>
# <|user|>
# I want you to act as a Prompt Rewriter.
# Your objective is to rewrite a given prompt into a more complex version to make those famous AI systems (e.g., chatgpt and GPT4) a bit harder to handle.
# But the rewritten prompt must be reasonable and must be understood and responded by humans.
# Your rewriting cannot omit the non-text parts such as the table and code in #The Given Prompt#:. Also, please do not omit the input in #The Given Prompt#.
# You SHOULD complicate the given prompt using the following method:
# Please add one more constraints/requirements into #The Given Prompt#
# You should try your best not to make the #Rewritten Prompt# become verbose, #Rewritten Prompt# can only add 10 to 20 words into #The Given Prompt#.
# '#The Given Prompt#', '#Rewritten Prompt#', 'given prompt' and 'rewritten prompt' are not allowed to appear in #Rewritten Prompt#
# #The Given Prompt#:
# What's a large language model?
# #Rewritten Prompt#:
# </s>
# <|assistant|>
import os
from distilabel.llm import OpenAILLM
from distilabel.tasks import JudgeLMTask
llm = OpenAILLM(
task=JudgeLMTask(),
openai_api_key=os.getenv("OPENAI_API_KEY", None),
temperature=0.3,
)
print(
llm.validate_prompts(
[
{
"input": "What's a large language model?",
"generations": [
"A Large Language Model (LLM) is a type of artificial intelligence that processes and generates human-like text based on vast amounts of training data.",
"Sorry I cannot answer that."
]
}
]
)[0]
)
# You are a helpful and precise assistant for checking the quality of the answer.
# [Question]
# What's an LLM?
# [The Start of Assistant 1's Answer>
# A Large Language Model (LLM) is a type of artificial intelligence that processes and generates human-like text based on vast amounts of training data.
# [The End of Assistant 1's Answer>
# [The Start of Assistant 2's Answer>
# Sorry I cannot answer that.
# [The End of Assistant 2's Answer>
# [System]
# We would like to request your feedback on the performance of 2 AI assistants in response to the user question displayed above.
# Please rate the helpfulness, relevance, accuracy, level of details of their responses. Each assistant receives an overall score on a scale of 1 to 10, where a higher score indicates better overall performance.
# Please first output a single line containing only 2 values indicating the scores for Assistants 1 to 2, respectively. The 2 scores are separated by a space. In the subsequent line, please provide a comprehensive explanation of your evaluation, avoiding any potential bias and ensuring that the order in which the responses were presented does not affect your judgment.
Integrations¶
OpenAI¶
These may be the default choice for your ambitious tasks.
For the API reference visit OpenAILLM.
import os
from distilabel.llm import OpenAILLM
from distilabel.tasks import TextGenerationTask
openaillm = OpenAILLM(
model="gpt-3.5-turbo",
task=TextGenerationTask(),
prompt_format="openai",
max_new_tokens=256,
api_key=os.getenv("OPENAI_API_KEY", None),
temperature=0.3,
)
result = openaillm.generate([{"input": "What is OpenAI?"}])
# >>> print(result[0][0]["parsed_output"]["generations"])
# OpenAI is an artificial intelligence research laboratory and company. It was founded
# with the goal of ensuring that artificial general intelligence (AGI) benefits all of
# humanity. OpenAI conducts cutting-edge research in various fields of AI ...
To generate JSON objects with OpenAI's json_response
feature you can use the JSONOpenAILLM
class:
import os
from distilabel.llm import JSONOpenAILLM
from distilabel.tasks import TextGenerationTask
openaillm = JSONOpenAILLM(
model="gpt-3.5-turbo-1106", # json response is a limited feature
task=TextGenerationTask(),
prompt_format="openai",
max_new_tokens=256,
api_key=os.getenv("OPENAI_API_KEY", None),
temperature=0.3,
)
result = openaillm.generate(
[{"input": "write a json object with a key 'city' and value 'Madrid'"}]
)
# >>> print(result[0][0]["parsed_output"]["generations"])
# {"answer": "Madrid"}
Refer to the Open AI API documentation for more information.
Llama.cpp¶
Applicable for local execution of Language Models (LLMs). Use this LLM when you have access to the quantized weights of your selected model for interaction.
Let's see an example using notus-7b-v1. First, you can download the weights from the following link:
from distilabel.llm import LlamaCppLLM
from distilabel.tasks import TextGenerationTask
from llama_cpp import Llama
# Instantiate our LLM with them:
llm = LlamaCppLLM(
model=Llama(model_path="./notus-7b-v1.q4_k_m.gguf", n_gpu_layers=-1),
task=TextGenerationTask(),
max_new_tokens=128,
temperature=0.3,
prompt_format="notus",
)
result = llm.generate([{"input": "What is the capital of Spain?"}])
# >>> print(result[0][0]["parsed_output"]["generations"])
# The capital of Spain is Madrid. It is located in the center of the country and
# is known for its vibrant culture, beautiful architecture, and delicious food.
# Madrid is home to many famous landmarks such as the Prado Museum, Retiro Park,
# and the Royal Palace of Madrid. I hope this information helps!
For the API reference visit LlammaCppLLM.
vLLM¶
Highly recommended to use if you have a GPU available, as it is the fastest solution out there for batch generation. Find more information about it in vLLM docs.
from distilabel.llm import vLLM
from distilabel.tasks import TextGenerationTask
from vllm import LLM
llm = vLLM(
model=LLM(model="argilla/notus-7b-v1"),
task=TextGenerationTask(),
max_new_tokens=512,
temperature=0.3,
prompt_format="notus",
)
result_vllm = llm.generate([{"input": "What's a large language model?"}])
# >>> print(result[0][0]["parsed_output"]["generations"])
# A large language model is a type of artificial intelligence (AI) system that is designed
# to understand and interpret human language. It is called "large" because it uses a vast
# amount of data, typically billions of words or more, to learn and make predictions about
# language. Large language models are ...
For the API reference visit vLLM.
Ollama¶
Highly recommended to use if you have a GPU available, as it is one of the fastest solutions out and also has metal support for the MacOS M1 chip and its follow-ups. Find more information about it in the Ollama GitHub.
Before being able to use Ollama you first need to install it. After that, you can select one of the models from their model library and use it as follows:
Note
The ollama run <model_name>
command will also set pre-defined generation parameters for the model. These can be found in their library and overridden by passing them as arguments to the command as shown here.
We can then re-use this model name as a reference within distilabel
through our OllamaLLM
implementation:
from distilabel.llm import OllamaLLM
from distilabel.tasks import TextGenerationTask
llm = OllamaLLM(
model="notus", # should be deployed via `ollama notus:7b-v1-q5_K_M`
task=TextGenerationTask(),
prompt_format="openai",
)
result = llm.generate([{"input": "What's a large language model?"}])
# >>> print(result[0][0]["parsed_output"]["generations"])
# A large language model is a type of artificial intelligence (AI) system that has been trained
# on a vast amount of text data to generate human-like language. These models are capable of
# understanding and generating complex sentences, and can be used for tasks such as language
# translation, text summarization, and natural language generation. They are typically very ...
HuggingFace LLMs¶
This section explains two different ways to use HuggingFace models:
Transformers¶
This is the option to use a model hosted on the HuggingFace Hub. Load the model and tokenizer in the standard manner as done locally, and proceed to instantiate your class.
For the API reference visit TransformersLLM.
Let's see an example using notus-7b-v1:
from distilabel.llm import TransformersLLM
from distilabel.tasks import TextGenerationTask
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the models from the HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained("argilla/notus-7b-v1")
model = AutoModelForCausalLM.from_pretrained("argilla/notus-7b-v1", device_map="auto")
# Instantiate our LLM with them:
llm = TransformersLLM(
model=model,
tokenizer=tokenizer,
task=TextGenerationTask(),
max_new_tokens=128,
temperature=0.3,
prompt_format="notus",
)
result = llm.generate([{"input": "What's a large language model?"}])
# >>> print(result[0][0]["parsed_output"]["generations"])
# A large language model is a type of machine learning algorithm that is designed to analyze
# and understand large amounts of text data. It is called "large" because it requires a
# vast amount of data to train and improve its accuracy. These models are ...
Inference Endpoints¶
HuggingFace provides a streamlined approach for deploying models through Inference Endpoints on their infrastructure. Opt for this solution if your model is hosted on the HuggingFace Hub.
For the API reference visit InferenceEndpointsLLM.
Let's see how to interact with these LLMs:
import os
from distilabel.llm import InferenceEndpointsLLM
from distilabel.tasks import TextGenerationTask
endpoint_name = "aws-notus-7b-v1-4052" or os.getenv("HF_INFERENCE_ENDPOINT_NAME")
endpoint_namespace = "argilla" or os.getenv("HF_NAMESPACE")
token = os.getenv("HF_TOKEN") # hf_...
llm = InferenceEndpointsLLM(
endpoint_name_or_model_id=endpoint_name,
endpoint_namespace=endpoint_namespace,
token=token,
task=TextGenerationTask(),
max_new_tokens=512,
prompt_format="notus",
)
result = llm.generate([{"input": "What are critique LLMs?"}])
# print(result[0][0]["parsed_output"]["generations"])
# Critique LLMs (Long Land Moore Machines) are artificial intelligence models designed specifically for analyzing and evaluating the quality or worth of a particular subject or object. These models can be trained on a large dataset of reviews, ratings, or commentary related to a product, service, artwork, or any other topic of interest.
# The training data can include both positive and negative feedback, helping the LLM to understand the nuanced aspects of quality and value. The model uses natural language processing (NLP) techniques to extract meaningful insights, including sentiment analysis, entity recognition, and text classification.
# Once the model is trained, it can be used to analyze new input data and provide a critical assessment based on its learned understanding of quality and value. For example, a critique LLM for movies could evaluate a new film and generate a detailed review highlighting its strengths, weaknesses, and overall rating.
# Critique LLMs are becoming increasingly useful in various industries, such as e-commerce, education, and entertainment, where they can provide objective and reliable feedback to help guide decision-making processes. They can also aid in content optimization by highlighting areas of improvement or recommending strategies for enhancing user engagement.
# In summary, critique LLMs are powerful tools for analyzing and evaluating the quality or worth of different subjects or objects, helping individuals and organizations make informed decisions with confidence.
Together Inference¶
Together offers a product named Together Inference, which exposes some models for diverse tasks such as chat, text generation, code, or image; exposing those via an endpoint within their API either as serverless endpoints or as dedicated instances.
See their release post with more details at Announcing Together Inference Engine – the fastest inference available.
from distilabel.llm import TogetherInferenceLLM
from distilabel.tasks import TextGenerationTask
llm = TogetherInferenceLLM(
model="togethercomputer/llama-2-70b-chat",
task=TextGenerationTask(),
max_new_tokens=512,
temperature=0.3,
prompt_format="llama2",
)
output = llm.generate(
[{"input": "Explain me the theory of relativity as if you were a pirate."}]
)
# >>> print(result[0][0]["parsed_output"]["generations"])
# Ahoy matey! Yer lookin' fer a tale of the theory of relativity, eh? Well,
# settle yerself down with a pint o' grog and listen close, for this be a story
# of the sea of time and space!
# Ye see, matey, the theory of relativity be tellin' us that time and space ain't
# fixed things, like the deck o' a ship or the stars in the sky. Nay, they be like
# the ocean itself, always changin' and flowin' like the tides.
# Now, imagine ...
Vertex AI LLMs¶
Google Cloud Vertex AI platform allows to use Google proprietary models and deploy other models for online predictions. distilabel
integrates with Vertex AI trough VertexAILLM
and VertexAIEndpointLLM
classes.
To use one of these classes you will need to have configured the Google Cloud authentication using one of these methods:
- Settings
GOOGLE_CLOUD_CREDENTIALS
environment variable - Using
gcloud auth application-default login
command - Using
vertexai.init
Python SDK function from thegoogle-cloud-aiplatform
library before instantiating theLLM
.
Proprietary models (Gemini and PaLM)¶
VertexAILLM
allows to use Google proprietary models such as Gemini and PaLM. These models are served trough Vertex AI and its different APIs:
- Gemini API: which offers models from the Gemini family such as
gemini-pro
andgemini-pro-vision
models. More information: Vertex AI - Gemini API. - Text Generation API: which offers models from the PaLM family such as
text-bison
. More information: Vertex AI - PaLM 2 for text. - Code Generation API: which offers models from the PaLM family for code-generation such as
code-bison
. More information: Vertex AI - Codey for code generation.
from distilabel.llm import VertexAILLM
from distilabel.tasks import TextGenerationTask
llm = VertexAILLM(
task=TextGenerationTask(), model="gemini-pro", max_new_tokens=512, temperature=0.3
)
results = llm.generate(
inputs=[
{"input": "Write a short summary about the Gemini astrological sign"},
],
)
# >>> print(results[0][0]["parsed_output"]["generations"])
# Gemini, the third astrological sign in the zodiac, is associated with the element of
# air and is ruled by the planet Mercury. People born under the Gemini sign are often
# characterized as being intelligent, curious, and communicative. They are known for their
# quick wit, adaptability, and versatility. Geminis are often drawn to learning and enjoy
# exploring new ideas and concepts. They are also known for their social nature and ability
# to connect with others easily. However, Geminis can also be seen as indecisive, restless,
# and superficial at times. They may struggle with commitment and may have difficulty focusing
# on one thing for too long. Overall, Geminis are known for their intelligence, curiosity,
# and social nature.
Endpoints for online prediction¶
VertexAIEndpointLLM
class allows to use a model deployed in a Vertex AI Endpoint for online prediction to generate text. Unlike the rest of LLM
s classes which comes with a set of pre-defined arguments in its __init__
method, VertexAIEndpointLLM
requires to provide the generation arguments to be used in a dictionary that will pased to the generation_kwargs
argument. This is because the generation parameters will be different and have different names depending on the Docker image deployed on the Vertex AI Endpoint.
from distilabel.llm import VertexAIEndpointLLM
from distilabel.tasks import TextGenerationTask
llm = VertexAIEndpointLLM(
task=TextGenerationTask(),
endpoint_id="3466410517680095232",
project="experiments-404412",
location="us-central1",
generation_kwargs={
"temperature": 1.0,
"max_tokens": 128,
"top_p": 1.0,
"top_k": 10,
},
)
results = llm.generate(
inputs=[
{"input": "Write a short summary about the Gemini astrological sign"},
],
)
# >>> print(results[0][0]["parsed_output"]["generations"])
# Geminis are known for their curiosity, adaptability, and love of knowledge. They are
# also known for their tendency to be indecisive, impulsive and prone to arguing. They
# are ruled by the planet Mercury, which is associated with communication, quick thinking,
# and change.
Anyscale¶
Anyscale Endpoints offers open source large language models (LLMs) as fully managed API endpoints. Interoperate with open source models as you would do it with OpenAI:
import os
from distilabel.llm import AnyscaleLLM
from distilabel.tasks import TextGenerationTask
anyscale_llm = AnyscaleLLM(
model="HuggingFaceH4/zephyr-7b-beta",
task=TextGenerationTask(),
api_key=os.environ.get("ANYSCALE_API_KEY"),
)
result = anyscale_llm.generate([{"input": "What is Anyscale?"}])
# >>> print(result[0][0]["parsed_output"]["generations"])
# 'Anyscale is a machine learning (ML) software company that provides tools and platforms
# for scalable distributed ML workflows. Their offerings enable data scientists and engineers
# to easily and efficiently deploy ML models at scale, both on-premise and on the cloud.
# Anyscale's core technology, Ray, is an open-source framework for distributed Python computation
# that provides a unified interface for distributed computing, resource management, and task scheduling.
# With Anyscale's solutions, businesses can accelerate their ML development and deployment cycles and drive
# greater value from their ML investments.'
For the API reference visit AnyscaleLLM.
MistralAI¶
Mistral.ai, the company behind awesome Open Source models like Mixtral 8x7B, offers their models in their AI platform. Visit their available models and start creating distilabel
datasets with them.
import os
from distilabel.llm import MistralAILLM
from distilabel.tasks import TextGenerationTask
mistralai_llm = MistralAILLM(
model="mistral-tiny",
task=TextGenerationTask(),
api_key=os.environ.get("MISTRALAI_API_KEY"),
)
result = mistralai_llm.generate([{"input": "What's the best french cheese?"}])
# >>> print(result[0][0]["parsed_output"]["generations"])
# I'd be happy to help answer your question, but it's important to note
# that the "best" French cheese can be subjective as it depends on personal
# taste preferences. Some popular and highly regarded French cheeses include
# Roquefort for its strong, tangy flavor and distinct blue veins; Camembert
# for its earthy, mushroomy taste and soft, runny texture; and Brie for its
# creamy, buttery, and slightly sweet taste. I'd recommend trying different
# types to find the one you enjoy the most.
For the API reference visit MistralAILLM.
ProcessLLM
and LLMPool
¶
By default, distilabel
uses a single process, so the generation loop is usually bottlenecked by the model inference time and Python GIL. To overcome this limitation, we provide the ProcessLLM
class that allows to load an LLM
in a different process, avoiding the GIL and allowing to parallelize the generation loop. Creating a ProcessLLM
is easy as:
from distilabel.llm import LLM, ProcessLLM
from distilabel.tasks import Task, TextGenerationTask
def load_gpt_4(task: Task) -> LLM:
from distilabel.llm import OpenAILLM
return OpenAILLM(
model="gpt-4",
task=task,
num_threads=4,
)
llm = ProcessLLM(task=TextGenerationTask(), load_llm_fn=load_gpt_4)
future = llm.generate(
inputs=[{"input": "Write a letter for Bob"}], num_generations=1
) # (1)
llm.teardown() # (2)
result = future.result()
# >>> print(result[0][0]["parsed_output"]["generations"])
# Dear Bob,
# I hope this letter finds you in good health and high spirits. I know it's been a while since we last caught up, and I wanted to take the time to connect and share a few updates.
# Life has been keeping me pretty busy lately. [Provide a brief overview of what you've been up to: work, school, family, hobbies, etc.]
# I've often found myself reminiscing about the good old days, like when we [include a memorable moment or shared experience with Bob].
- The
ProcessLLM
returns aFuture
containing a list of lists ofLLMOutput
s. - The
ProcessLLM
needs to be terminated after usage. If theProcessLLM
is used by aPipeline
, it will be terminated automatically.
You can directly use a ProcessLLM
as the generator
or labeller
in a Pipeline
. Apart from that, there would be situations in which you would like to generate texts using several LLM
s in parallel. For this purpose, we provide the LLMPool
class:
from distilabel.llm import LLM, LLMPool, ProcessLLM
from distilabel.tasks import Task, TextGenerationTask
def load_gpt_3(task: Task) -> LLM:
from distilabel.llm import OpenAILLM
return OpenAILLM(
model="gpt-3.5-turbo",
task=task,
num_threads=4,
)
def load_gpt_4(task: Task) -> LLM:
from distilabel.llm import OpenAILLM
return OpenAILLM(
model="gpt-4",
task=task,
num_threads=4,
)
pool = LLMPool(
llms=[
ProcessLLM(task=TextGenerationTask(), load_llm_fn=load_gpt_3),
ProcessLLM(task=TextGenerationTask(), load_llm_fn=load_gpt_4),
]
)
result = pool.generate(inputs=[{"input": "Write a letter for Bob"}], num_generations=2)
pool.teardown()
# >>> print(result[0][0]["parsed_output"]["generations"], end="\n\n\n\n\n\n---->")
# Dear Bob,
# I hope this letter finds you in good health and high spirits. I know it's been a while since we last caught up, and I wanted to take the time to connect and share a few updates.
# Life has been keeping me pretty busy lately. [Provide a brief overview of what you've been up to: work, school, family, hobbies, etc.]
# I've often found myself reminiscing about the good old days, like when we [include a memorable moment or shared experience with Bob].
# >>> print(result[0][1]["parsed_output"]["generations"])
# Of course, I'd be happy to draft a sample letter for you. However, I would need some additional
# information including who "Bob" is, the subject matter of the letter, the tone (formal or informal),
# and any specific details or points you'd like to include. Please provide some more context and I'll do my best to assist you.