Tasks#
The dqa.tasks.tasks.Task
class is a crucial core ingredient of the Knowlestry framework. By deriving subclasses
form it, a wide range of operations on data can be implemented.
Various operations implemented in this form are already included in the framework (see Tasks).
Custom extensions can easily be added by creating new subclasses.
The Task class implements modify_*()
methods in multiple levels as described in DQA internal data format.
By default, modify_dataset_dict()
on the highest level calls modify_dataset()
for each Dataset
and this
again continues in an analogous fashion.
Depending on the type of the operation to be implemented, overriding the modify_*()
method of one specific level
could be most convenient. For example, an elementary operation such as a logarithm
(dqa.tasks.transformations.Log
) only works on a data row and ist most conveniently implemented by
overriding the modify_data_row()
method. On the other hand, for example the class
dqa.tasks.data_structure.JoinMachines
joins the data from multiple machines into one (within a dataset) and
is implemented by overriding modify_dataset()
.
By default, a task is applied to every dataset, every machine, etc. However, the constructor parameters in the Task
class can restrict the parts of the datasets it should be applied to. Specifically, the input_dataset
parameter
can be given as a list of strings or only one string and specifies that the Task will only be applied to these
datasets. By default, it is Null
, indicating that the Task will be applied to all datasets. By specifying
output_dataset
(usually a list with the same length as input_dataset
), the output of the Task can also be
written to a different dataset. By default, it is written to the same one. Analogously, there are also such
parameters for the other levels:
input_machine
andoutput_machine
specify the Machine names to use as input/output.input_name
andoutput_name
specify the names of the input/output data rows.