Configuration#
Multiple tasks can be arranged in a TaskList to form a pipeline to process data. To provide an elementary, standardized format for such pipelines, they can be defined in a JSON file.
The class dqa.config.ConfigSystem
can parse such JSON files. Their structure is described here.
The root dictionary in a JSON configuration file can contain the following keys: TaskLists
, Parameters
,
Includes
, Algos
.
Key: Parameters
#
Under this key, a dictionary of general parameters can be specified, as the following example shows:
{
"Parameters": {
"parameter1": 5
"parameter2": "value2"
}
}
The values of these parameters can be used in the rest of the file. Any occurrence of the string “%parameter1” will be replaced by the value of “parameter1”, here 5.
Key: TaskLists
#
Under this key, a dictionary is provided mapping names to TaskLists, which themselves consist of a list of Tasks.
{
"TaskLists": {
"List1": [
{
"Class": "TaskClass1",
"parameter1": "value1",
"parameter2": "value2"
},
{
"Class": "TaskClass2",
"parameter1": "value1",
"parameter2": "value2"
}
]
}
}
Each task in the list is represented by a dictionary containing parameters. The main parameter Class
specifies
the class of the Task. This should be a subclass of dqa.tasks.tasks.Task
. Note that custom classes must
be registered by ConfigSystem.load_task_class(TaskClass)
to be recognized. All other parameters in the dictionary
beside Class
will be passed to the constructor of this class when the objects are generated.
Instead of a single Task, also an entire TaskList can be integrated into another TaskList in the following way.
{
"TaskLists": {
"List1": [
{
"List": "List2",
"parameters": { "p1": 7 }
}
],
"List2": [
]
}
}
So instead of Class
, the Task dictionary contains List
. With parameters
, specific parameters can be set
within the nested list whose value can be accessed there with "%parameter_name"
.
Key: Includes
#
With this key, other JSON files can be incorporated into the current one. In this way, TaskLists from the included files can be used. Such JSON files to be included in other configurations are also referred to as templates.
{
"Includes": [
{
"Filename": ["File1.json", "File2.json"]
}
]
}
This includes File1.json
and File2.json
. If File1.json
contains a TaskList with the name TaskList1
, then
this can be included into another TaskList using the following entry.
{
"List": "File1.TaskList1"
}
Key: Algos
#
In this section, model algorithms, usually regressors or classifiers such as neural networks, support vector machines, etc. can be defined. For example:
{
"Algos": {
"MyAlgo": {
"Class": "MLPRegressor",
"verbose": true,
"max_iter": 4500
}
}
Similar to the Tasks, the Class
parameter specifies the class to be loaded and all other parameters are passed to
the constructor of this class. A custom class can be registered by ConfigSystem.load_algorithm_class(AlgoClass)
.
The classes to be used here do not need to inherit from a specific base class. However, they need to implement
fit(X, y)
and predict(x)
methods as the model classes from the scikit-learn
library. Indeed, the
scikit-learn
models can usually be used directly.
After their definition, such algorithms can be used for certain Tasks that work with models. Most importantly, this
includes the dqa.tasks.ml.Training
and dqa.tasks.ml.Prediction
classes. For example, the training of
a model can be specified as a Task with the following entry.
{
"Class": "Training",
"input_name": "X",
"labels_name": "Labels",
"algo": "%ALGO:MyAlgo"
}