Configuration#

Multiple tasks can be arranged in a TaskList to form a pipeline to process data. To provide an elementary, standardized format for such pipelines, they can be defined in a JSON file.

The class dqa.config.ConfigSystem can parse such JSON files. Their structure is described here.

The root dictionary in a JSON configuration file can contain the following keys: TaskLists, Parameters, Includes, Algos.

Key: Parameters#

Under this key, a dictionary of general parameters can be specified, as the following example shows:

Parameters example#
{
    "Parameters": {
        "parameter1": 5
        "parameter2": "value2"
    }
}

The values of these parameters can be used in the rest of the file. Any occurrence of the string “%parameter1” will be replaced by the value of “parameter1”, here 5.

Key: TaskLists#

Under this key, a dictionary is provided mapping names to TaskLists, which themselves consist of a list of Tasks.

TaskLists config example#
{
    "TaskLists": {
        "List1": [
            {
                "Class": "TaskClass1",
                "parameter1": "value1",
                "parameter2": "value2"
            },
            {
                "Class": "TaskClass2",
                "parameter1": "value1",
                "parameter2": "value2"
            }
        ]
    }
}

Each task in the list is represented by a dictionary containing parameters. The main parameter Class specifies the class of the Task. This should be a subclass of dqa.tasks.tasks.Task. Note that custom classes must be registered by ConfigSystem.load_task_class(TaskClass) to be recognized. All other parameters in the dictionary beside Class will be passed to the constructor of this class when the objects are generated.

Instead of a single Task, also an entire TaskList can be integrated into another TaskList in the following way.

TaskLists config example, including other TaskLists#
{
    "TaskLists": {
        "List1": [
            {
                "List": "List2",
                "parameters": { "p1": 7 }
            }
        ],
        "List2": [
        ]
    }
}

So instead of Class, the Task dictionary contains List. With parameters, specific parameters can be set within the nested list whose value can be accessed there with "%parameter_name".

Key: Includes#

With this key, other JSON files can be incorporated into the current one. In this way, TaskLists from the included files can be used. Such JSON files to be included in other configurations are also referred to as templates.

Includes in config#
{
    "Includes": [
        {
          "Filename": ["File1.json", "File2.json"]
        }
    ]
}

This includes File1.json and File2.json. If File1.json contains a TaskList with the name TaskList1, then this can be included into another TaskList using the following entry.

Entry to include a task list from another file#
{
    "List": "File1.TaskList1"
}

Key: Algos#

In this section, model algorithms, usually regressors or classifiers such as neural networks, support vector machines, etc. can be defined. For example:

Algorith definition#
{
    "Algos": {
        "MyAlgo": {
            "Class": "MLPRegressor",
            "verbose": true,
            "max_iter": 4500
        }
}

Similar to the Tasks, the Class parameter specifies the class to be loaded and all other parameters are passed to the constructor of this class. A custom class can be registered by ConfigSystem.load_algorithm_class(AlgoClass).

The classes to be used here do not need to inherit from a specific base class. However, they need to implement fit(X, y) and predict(x) methods as the model classes from the scikit-learn library. Indeed, the scikit-learn models can usually be used directly.

After their definition, such algorithms can be used for certain Tasks that work with models. Most importantly, this includes the dqa.tasks.ml.Training and dqa.tasks.ml.Prediction classes. For example, the training of a model can be specified as a Task with the following entry.

Model training task#
{
    "Class": "Training",
    "input_name": "X",
    "labels_name": "Labels",
    "algo": "%ALGO:MyAlgo"
}