Base
StepInput = Annotated[List[Dict[str, Any]], _STEP_INPUT_ANNOTATION]
module-attribute
¶
StepInput is just an Annotated
alias of the typing List[Dict[str, Any]]
with
extra metadata that allows distilabel
to perform validations over the process
step
method defined in each Step
GeneratorStep
¶
Bases: _Step
, ABC
A special kind of Step
that is able to generate data i.e. it doesn't receive
any input from the previous steps.
Attributes:
Name | Type | Description |
---|---|---|
batch_size |
RuntimeParameter[int]
|
The number of rows that will contain the batches generated by the
step. Defaults to |
Runtime parameters
batch_size
: The number of rows that will contain the batches generated by the step. Defaults to50
.
Source code in src/distilabel/steps/base.py
process(offset=0)
abstractmethod
¶
Method that defines the generation logic of the step. It should yield the output rows and a boolean indicating if it's the last batch or not.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
offset |
int
|
The offset to start the generation from. Defaults to 0. |
0
|
Yields:
Type | Description |
---|---|
GeneratorStepOutput
|
The output rows and a boolean indicating if it's the last batch or not. |
Source code in src/distilabel/steps/base.py
process_applying_mappings(offset=0)
¶
Runs the process
method of the step applying the outputs_mappings
to the
output rows. This is the function that should be used to run the generation logic
of the step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
offset |
int
|
The offset to start the generation from. Defaults to 0. |
0
|
Yields:
Type | Description |
---|---|
GeneratorStepOutput
|
The output rows and a boolean indicating if it's the last batch or not. |
Source code in src/distilabel/steps/base.py
GlobalStep
¶
Bases: Step
, ABC
A special kind of Step
which it's process
method receives all the data processed
by their previous steps at once, instead of receiving it in batches. This kind of steps
are useful when the processing logic requires to have all the data at once, for example
to train a model, to perform a global aggregation, etc.
Source code in src/distilabel/steps/base.py
Step
¶
Bases: _Step
, ABC
Base class for the steps that can be included in a Pipeline
.
Attributes:
Name | Type | Description |
---|---|---|
input_batch_size |
RuntimeParameter[PositiveInt]
|
The number of rows that will contain the batches processed by
the step. Defaults to |
Runtime parameters
input_batch_size
: The number of rows that will contain the batches processed by the step. Defaults to50
.
Source code in src/distilabel/steps/base.py
504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 |
|
process(*inputs)
abstractmethod
¶
Method that defines the processing logic of the step. It should yield the output rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*inputs |
StepInput
|
An argument used to receive the outputs of the previous steps. The
number of arguments depends on the number of previous steps. It doesn't
need to be an |
()
|
Source code in src/distilabel/steps/base.py
process_applying_mappings(*args)
¶
Runs the process
method of the step applying the input_mappings
to the input
rows and the outputs_mappings
to the output rows. This is the function that
should be used to run the processing logic of the step.
Yields:
Type | Description |
---|---|
StepOutput
|
The output rows. |