Types¶
This section contains the different types used accross the distilabel codebase.
base
¶
ChatType = List[ChatItem]
module-attribute
¶
ChatType is a type alias for a list of dicts following the OpenAI conversational format.
ImageUrl
¶
ImageContent
¶
Bases: TypedDict
Type alias for the user's message in a conversation that can include text or an image. It's the standard type for vision language models: https://platform.openai.com/docs/guides/vision
Source code in src/distilabel/typing/base.py
steps
¶
StepOutput = Iterator[List[Dict[str, Any]]]
module-attribute
¶
StepOutput is an alias of the typing Iterator[List[Dict[str, Any]]]
GeneratorStepOutput = Iterator[Tuple[List[Dict[str, Any]], bool]]
module-attribute
¶
GeneratorStepOutput is an alias of the typing Iterator[Tuple[List[Dict[str, Any]], bool]]
StepColumns = Union[List[str], Dict[str, bool]]
module-attribute
¶
StepColumns is an alias of the typing Union[List[str], Dict[str, bool]] used by the
inputs and outputs properties of an Step. In the case of a List[str], it is a list
with the required columns. In the case of a Dict[str, bool], it is a dictionary where
the keys are the columns and the values are booleans indicating whether the column is
required or not.
models
¶
LLMLogprobs = List[List[List[Logprob]]]
module-attribute
¶
A type alias representing the probability distributions output by an LLM.
Structure
- Outermost list: contains multiple generation choices when sampling (
nsequences) - Middle list: represents each position in the generated sequence
- Innermost list: contains the log probabilities for each token in the vocabulary at that position
LLMStatistics = Union[TokenCount, Dict[str, Any]]
module-attribute
¶
Initially the LLMStatistics will contain the token count, but can have more variables. They can be added once we have them defined for every LLM.
StructuredOutputType = Union[OutlinesStructuredOutputType, InstructorStructuredOutputType]
module-attribute
¶
StructuredOutputType is an alias for the union of OutlinesStructuredOutputType and InstructorStructuredOutputType.
StandardInput = ChatType
module-attribute
¶
StandardInput is an alias for ChatType that defines the default / standard input produced by format_input.
StructuredInput = Tuple[StandardInput, Union[StructuredOutputType, None]]
module-attribute
¶
StructuredInput defines a type produced by format_input when using either StructuredGeneration or a subclass of it.
FormattedInput = Union[StandardInput, StructuredInput, str]
module-attribute
¶
FormattedInput is an alias for the union of StandardInput and StructuredInput as generated
by format_input and expected by the LLMs, as well as ConversationType for the vision language models.
OutlinesStructuredOutputType
¶
Bases: TypedDict
TypedDict to represent the structured output configuration from outlines.
Source code in src/distilabel/typing/models.py
format
instance-attribute
¶
One of "json" or "regex".
schema
instance-attribute
¶
The schema to use for the structured output. If "json", it
can be a pydantic.BaseModel class, or the schema as a string,
as obtained from model_to_schema(BaseModel), if "regex", it
should be a regex pattern as a string.
whitespace_pattern
instance-attribute
¶
If "json" corresponds to a string or a list of
strings with a pattern (doesn't impact string literals).
For example, to allow only a single space or newline with
whitespace_pattern=r"[
]?"
InstructorStructuredOutputType
¶
Bases: TypedDict
TypedDict to represent the structured output configuration from instructor.
Source code in src/distilabel/typing/models.py
format
instance-attribute
¶
One of "json".
schema
instance-attribute
¶
The schema to use for the structured output, a pydantic.BaseModel class.
mode
instance-attribute
¶
Generation mode. Take a look at instructor.Mode for more information, if not informed it will
be determined automatically.
max_retries
instance-attribute
¶
Number of times to reask the model in case of error, if not set will default to the model's default.
pipeline
¶
DownstreamConnectable = Union['Step', 'GlobalStep']
module-attribute
¶
Alias for the Step types that can be connected as downstream steps.
UpstreamConnectableSteps = TypeVar('UpstreamConnectableSteps', bound=(Union['Step', 'GlobalStep', 'GeneratorStep']))
module-attribute
¶
Type for the Step types that can be connected as upstream steps.
DownstreamConnectableSteps = TypeVar('DownstreamConnectableSteps', bound=DownstreamConnectable, covariant=True)
module-attribute
¶
Type for the Step types that can be connected as downstream steps.
PipelineRuntimeParametersInfo = Dict[str, Union[List['RuntimeParameterInfo'], Dict[str, 'RuntimeParameterInfo']]]
module-attribute
¶
Alias for the information of the runtime parameters of a Pipeline.
InputDataset = Union['Dataset', 'pd.DataFrame', List[Dict[str, str]]]
module-attribute
¶
Alias for the types we can process as input dataset.
LoadGroups = Union[List[List[Any]], Literal['sequential_step_execution']]
module-attribute
¶
Alias for the types that can be used as load groups.
- if
List[List[Any]], it's a list containing lists of steps that have to be loaded in isolation. - if "sequential_step_execution", each step will be loaded in a different stage i.e. only one step will be executed at a time.
StepLoadStatus
¶
Bases: TypedDict
Dict containing information about if one step was loaded/unloaded or if it's load failed