Types¶
This section contains the different types used accross the distilabel codebase.
base
¶
ChatType = List[ChatItem]
module-attribute
¶
ChatType is a type alias for a list
of dict
s following the OpenAI conversational format.
ImageUrl
¶
ImageContent
¶
Bases: TypedDict
Type alias for the user's message in a conversation that can include text or an image. It's the standard type for vision language models: https://platform.openai.com/docs/guides/vision
Source code in src/distilabel/typing/base.py
steps
¶
StepOutput = Iterator[List[Dict[str, Any]]]
module-attribute
¶
StepOutput
is an alias of the typing Iterator[List[Dict[str, Any]]]
GeneratorStepOutput = Iterator[Tuple[List[Dict[str, Any]], bool]]
module-attribute
¶
GeneratorStepOutput
is an alias of the typing Iterator[Tuple[List[Dict[str, Any]], bool]]
StepColumns = Union[List[str], Dict[str, bool]]
module-attribute
¶
StepColumns
is an alias of the typing Union[List[str], Dict[str, bool]]
used by the
inputs
and outputs
properties of an Step
. In the case of a List[str]
, it is a list
with the required columns. In the case of a Dict[str, bool]
, it is a dictionary where
the keys are the columns and the values are booleans indicating whether the column is
required or not.
models
¶
LLMLogprobs = List[List[List[Logprob]]]
module-attribute
¶
A type alias representing the probability distributions output by an LLM
.
Structure
- Outermost list: contains multiple generation choices when sampling (
n
sequences) - Middle list: represents each position in the generated sequence
- Innermost list: contains the log probabilities for each token in the vocabulary at that position
LLMStatistics = Union[TokenCount, Dict[str, Any]]
module-attribute
¶
Initially the LLMStatistics will contain the token count, but can have more variables. They can be added once we have them defined for every LLM.
StructuredOutputType = Union[OutlinesStructuredOutputType, InstructorStructuredOutputType]
module-attribute
¶
StructuredOutputType is an alias for the union of OutlinesStructuredOutputType
and InstructorStructuredOutputType
.
StandardInput = ChatType
module-attribute
¶
StandardInput is an alias for ChatType that defines the default / standard input produced by format_input
.
StructuredInput = Tuple[StandardInput, Union[StructuredOutputType, None]]
module-attribute
¶
StructuredInput defines a type produced by format_input
when using either StructuredGeneration
or a subclass of it.
FormattedInput = Union[StandardInput, StructuredInput, str]
module-attribute
¶
FormattedInput is an alias for the union of StandardInput
and StructuredInput
as generated
by format_input
and expected by the LLM
s, as well as ConversationType
for the vision language models.
OutlinesStructuredOutputType
¶
Bases: TypedDict
TypedDict to represent the structured output configuration from outlines
.
Source code in src/distilabel/typing/models.py
format
instance-attribute
¶
One of "json" or "regex".
schema
instance-attribute
¶
The schema to use for the structured output. If "json", it
can be a pydantic.BaseModel class, or the schema as a string,
as obtained from model_to_schema(BaseModel)
, if "regex", it
should be a regex pattern as a string.
whitespace_pattern
instance-attribute
¶
If "json" corresponds to a string or a list of
strings with a pattern (doesn't impact string literals).
For example, to allow only a single space or newline with
whitespace_pattern=r"[
]?"
InstructorStructuredOutputType
¶
Bases: TypedDict
TypedDict to represent the structured output configuration from instructor
.
Source code in src/distilabel/typing/models.py
format
instance-attribute
¶
One of "json".
schema
instance-attribute
¶
The schema to use for the structured output, a pydantic.BaseModel
class.
mode
instance-attribute
¶
Generation mode. Take a look at instructor.Mode
for more information, if not informed it will
be determined automatically.
max_retries
instance-attribute
¶
Number of times to reask the model in case of error, if not set will default to the model's default.
pipeline
¶
DownstreamConnectable = Union['Step', 'GlobalStep']
module-attribute
¶
Alias for the Step
types that can be connected as downstream steps.
UpstreamConnectableSteps = TypeVar('UpstreamConnectableSteps', bound=Union['Step', 'GlobalStep', 'GeneratorStep'])
module-attribute
¶
Type for the Step
types that can be connected as upstream steps.
DownstreamConnectableSteps = TypeVar('DownstreamConnectableSteps', bound=DownstreamConnectable, covariant=True)
module-attribute
¶
Type for the Step
types that can be connected as downstream steps.
PipelineRuntimeParametersInfo = Dict[str, Union[List['RuntimeParameterInfo'], Dict[str, 'RuntimeParameterInfo']]]
module-attribute
¶
Alias for the information of the runtime parameters of a Pipeline
.
InputDataset = Union['Dataset', 'pd.DataFrame', List[Dict[str, str]]]
module-attribute
¶
Alias for the types we can process as input dataset.
LoadGroups = Union[List[List[Any]], Literal['sequential_step_execution']]
module-attribute
¶
Alias for the types that can be used as load groups.
- if
List[List[Any]]
, it's a list containing lists of steps that have to be loaded in isolation. - if "sequential_step_execution", each step will be loaded in a different stage i.e. only one step will be executed at a time.
StepLoadStatus
¶
Bases: TypedDict
Dict containing information about if one step was loaded/unloaded or if it's load failed