Skip to content

Outlines

Frameworks = Literal['transformers', 'llamacpp', 'vllm'] module-attribute

Available frameworks for the structured output configuration.

StructuredOutputType

Bases: TypedDict

TypedDict to represent the structured output configuration from outlines.

Source code in src/distilabel/steps/tasks/structured_outputs/outlines.py
class StructuredOutputType(TypedDict):
    """TypedDict to represent the structured output configuration from outlines."""

    format: Literal["json", "regex"]
    """One of "json" or "regex"."""
    schema: Union[str, Type[BaseModel]]
    """The schema to use for the structured output. If "json", it
    can be a pydantic.BaseModel class, or the schema as a string,
    as obtained from `model_to_schema(BaseModel)`, if "regex", it
    should be a regex pattern as a string.
    """
    whitespace_pattern: Optional[Union[str, List[str]]]
    """If "json" corresponds to a string or a list of
    strings with a pattern (doesn't impact string literals).
    For example, to allow only a single space or newline with
    `whitespace_pattern=r"[\n ]?"`
    """

format: Literal['json', 'regex'] instance-attribute

One of "json" or "regex".

schema: Union[str, Type[BaseModel]] instance-attribute

The schema to use for the structured output. If "json", it can be a pydantic.BaseModel class, or the schema as a string, as obtained from model_to_schema(BaseModel), if "regex", it should be a regex pattern as a string.

whitespace_pattern: Optional[Union[str, List[str]]] instance-attribute

If "json" corresponds to a string or a list of strings with a pattern (doesn't impact string literals). For example, to allow only a single space or newline with whitespace_pattern=r"[ ]?"

model_to_schema(schema)

Helper function to return a string representation of the schema from a pydantic.BaseModel class.

Source code in src/distilabel/steps/tasks/structured_outputs/outlines.py
def model_to_schema(schema: Type[BaseModel]) -> Dict[str, Any]:
    """Helper function to return a string representation of the schema from a `pydantic.BaseModel` class."""
    return json.dumps(schema.model_json_schema())

prepare_guided_output(structured_output, framework, llm)

Prepares the LLM to generate guided output using outlines.

It allows to generate JSON or Regex structured outputs for the integrated frameworks.

Parameters:

Name Type Description Default
structured_output StructuredOutputType

the structured output configuration.

required
framework Frameworks

the framework to use for the structured output.

required
llm Any

the LLM instance, each framework requires one thing so it should be obtained in the LLM itself.

required

Raises:

Type Description
ValueError

if the format is not "json" or "regex".

Returns:

Type Description
Dict[str, Union[Callable, None]]

A dictionary containing the processor to use for the guided output, and in

Dict[str, Union[Callable, None]]

case of "json" will also include the schema as a dict, to simplify serialization

Dict[str, Union[Callable, None]]

and deserialization.

Source code in src/distilabel/steps/tasks/structured_outputs/outlines.py
def prepare_guided_output(
    structured_output: StructuredOutputType,
    framework: Frameworks,
    llm: Any,
) -> Dict[str, Union[Callable, None]]:
    """Prepares the `LLM` to generate guided output using `outlines`.

    It allows to generate JSON or Regex structured outputs for the integrated
    frameworks.

    Args:
        structured_output: the structured output configuration.
        framework: the framework to use for the structured output.
        llm: the `LLM` instance, each framework requires one thing so it should
            be obtained in the `LLM` itself.

    Raises:
        ValueError: if the format is not "json" or "regex".

    Returns:
        A dictionary containing the processor to use for the guided output, and in
        case of "json" will also include the schema as a dict, to simplify serialization
        and deserialization.
    """
    if not importlib.util.find_spec("outlines"):
        raise ImportError(
            "Outlines is not installed. Please install it using `pip install outlines`."
        )

    json_processor, regex_processor = _get_logits_processor(framework)

    format = structured_output.get("format")
    schema = structured_output.get("schema")

    if format == "json":
        return {
            "processor": json_processor(
                schema,
                llm,
                whitespace_pattern=structured_output.get("whitespace_pattern"),
            ),
            "schema": _schema_as_dict(schema),
        }

    if format == "regex":
        return {"processor": regex_processor(schema, llm)}

    raise ValueError(f"Invalid format '{format}'. Must be either 'json' or 'regex'.")