Configuration ============= Multiple tasks can be arranged in a TaskList to form a pipeline to process data. To provide an elementary, standardized format for such pipelines, they can be defined in a JSON file. The class :class:`dqa.config.ConfigSystem` can parse such JSON files. Their structure is described here. The root dictionary in a JSON configuration file can contain the following keys: ``TaskLists``, ``Parameters``, ``Includes``, ``Algos``. Key: ``Parameters`` ------------------- Under this key, a dictionary of general parameters can be specified, as the following example shows: .. code-block:: json :caption: Parameters example { "Parameters": { "parameter1": 5 "parameter2": "value2" } } The values of these parameters can be used in the rest of the file. Any occurrence of the string "%parameter1" will be replaced by the value of "parameter1", here 5. Key: ``TaskLists`` ------------------ Under this key, a dictionary is provided mapping names to TaskLists, which themselves consist of a list of Tasks. .. code-block:: json :caption: TaskLists config example { "TaskLists": { "List1": [ { "Class": "TaskClass1", "parameter1": "value1", "parameter2": "value2" }, { "Class": "TaskClass2", "parameter1": "value1", "parameter2": "value2" } ] } } Each task in the list is represented by a dictionary containing parameters. The main parameter ``Class`` specifies the class of the Task. This should be a subclass of :class:`dqa.tasks.tasks.Task`. Note that custom classes must be registered by ``ConfigSystem.load_task_class(TaskClass)`` to be recognized. All other parameters in the dictionary beside ``Class`` will be passed to the constructor of this class when the objects are generated. Instead of a single Task, also an entire TaskList can be integrated into another TaskList in the following way. .. code-block:: json :caption: TaskLists config example, including other TaskLists { "TaskLists": { "List1": [ { "List": "List2", "parameters": { "p1": 7 } } ], "List2": [ ] } } So instead of ``Class``, the Task dictionary contains ``List``. With ``parameters``, specific parameters can be set within the nested list whose value can be accessed there with ``"%parameter_name"``. Key: ``Includes`` ----------------- With this key, other JSON files can be incorporated into the current one. In this way, TaskLists from the included files can be used. Such JSON files to be included in other configurations are also referred to as templates. .. code-block:: json :caption: Includes in config { "Includes": [ { "Filename": ["File1.json", "File2.json"] } ] } This includes ``File1.json`` and ``File2.json``. If ``File1.json`` contains a TaskList with the name ``TaskList1``, then this can be included into another TaskList using the following entry. .. code-block:: json :caption: Entry to include a task list from another file { "List": "File1.TaskList1" } Key: ``Algos`` -------------- In this section, model algorithms, usually regressors or classifiers such as neural networks, support vector machines, etc. can be defined. For example: .. code-block:: json :caption: Algorith definition { "Algos": { "MyAlgo": { "Class": "MLPRegressor", "verbose": true, "max_iter": 4500 } } Similar to the Tasks, the ``Class`` parameter specifies the class to be loaded and all other parameters are passed to the constructor of this class. A custom class can be registered by ``ConfigSystem.load_algorithm_class(AlgoClass)``. The classes to be used here do not need to inherit from a specific base class. However, they need to implement ``fit(X, y)`` and ``predict(x)`` methods as the model classes from the ``scikit-learn`` library. Indeed, the ``scikit-learn`` models can usually be used directly. After their definition, such algorithms can be used for certain Tasks that work with models. Most importantly, this includes the :class:`dqa.tasks.ml.Training` and :class:`dqa.tasks.ml.Prediction` classes. For example, the training of a model can be specified as a Task with the following entry. .. code-block:: json :caption: Model training task { "Class": "Training", "input_name": "X", "labels_name": "Labels", "algo": "%ALGO:MyAlgo" }