Internal data format#

The Knowlestry Application Framework works with an internal data representation format, which is implemented in the module dqa.data_representation.data_representation. The main purpose of this data format is a standardized interface to pass data between processing units (“Tasks”, implemented in the dqa.tasks.tasks.Task class).

The data structure follows a hierarchy with multiple levels. For each of these levels, the Task class provides a corresponding method modify_*. By overriding this method, one can implement a task by modifying the data for each unit on the particular level separately.

The following table summarizes the hierarchy of the data format along with the corresponding methods in the task class.

Data structure level

Method in dqa.tasks.tasks.Task

dataset_dict: Dict[str, Dataset]

modify_dataset_dict(datasets)

    ↓   str key

    ↓

dataset: Dataset

modify_dataset(dataset)

    ↓   str key (in dataset.machines)

    ↓

machine: Machine

modify_machine(machine)

    ↓   int index (in machine.measurements)

    ↓

measurement: Dict[str, Dict[str, Any]]

modify_measurement(measurement)

    ↓     ↓   'metadata' key

    ↓

    ↓   metadata: Dict[str, Any]

    ↓

    ↓   'data' key

    ↓

data: Dict[str, np.ndarray]

    ↓

    ↓   str key

    ↓

data_row: np.ndarray

modify_data_row(data_row)

The total data processed by the Knowlestry Application Framework is contained in a dictionary of dqa.data_representation.data_representation.Dataset objects whose keys are strings. Each such Dataset contains a dictionary machines mapping string identifiers to dqa.data_representation.data_representation.Machine objects.

The Dataset and Machine class both implement the [] operator, allowing to access the includes machines or measurements, respectively. For example,

Access operator example#
dataset_dict['Data']['M0'][0]['data']['my_data_row']

accesses the data row my_data_row in the measurement with index 0 of the machine M0 in the dataset Data.