Pipeline Utils¶
group_columns(*inputs, group_columns, output_group_columns=None)
¶
Groups multiple list of dictionaries into a single list of dictionaries on the
specified group_columns
. If group_columns
are provided, then it will also rename
group_columns
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
StepInput
|
list of dictionaries to combine. |
()
|
group_columns |
List[str]
|
list of keys to merge on. |
required |
output_group_columns |
Optional[List[str]]
|
list of keys to rename the merge keys to. Defaults to |
None
|
Returns:
Type | Description |
---|---|
StepInput
|
A list of dictionaries where the values of the |
StepInput
|
list and renamed to |
Source code in src/distilabel/pipeline/utils.py
merge_columns(row, columns, new_column='combined_key')
¶
Merge columns in a dictionary into a single column on the specified new_column
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
row |
Dict[str, Any]
|
Dictionary corresponding to a row in a dataset. |
required |
columns |
List[str]
|
List of keys to merge. |
required |
new_column |
str
|
Name of the new key created. |
'combined_key'
|
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dictionary with the new merged key. |