Internal data format (dqa.data_representation.data_representation)#

Internal data representation format for the DQA.

The internal data format follows a hierarchical structure distributing the data in the following units: Datasets -> Machines -> Measurements -> Data Rows

  • Data Rows are usually represented by numpy arrays, often representing time series. They can also be arrays of higher order (e.g. a three-phase current). In this case the axis 0 should correspond to the time dimension if there is one.

  • Measurements are dictionaries containing nested dictionaries for two keywords: ‘data’ and ‘metadata’. The ‘data’ sub-dictionary contains all data rows by their name. Note that not all these data rows need to have the same dimensions. The ‘metadata’ sub-dictionary may contain additional information about the measurement. For each data row that is contained in the ‘data’ dictionary, there can be a separate metadata dictionary in measurement[‘metadata’] under the same name.

  • A Machine contains a list of measurements. Usually all the included data belong to the same part of the data source (e.g. the same engine etc.). It is represented by the Machine class.

  • A Dataset contains a dictionary mapping to each machine by its name.

class dqa.data_representation.data_representation.Dataset(machines: Dict[str, Machine], index: int = 0, total: int = 1)#

This class represents a dataset, whose content is a dictionary mapping strings to machines.

Methods

join_datasets

join_datasets_by_measurement

keys

class dqa.data_representation.data_representation.Machine(measurements: List | None = None)#

This class represents a machine, which contains a list of measurements.

Each measurement is a dictionary containing two keys: - ‘data’: For this key, there is a dictionary that maps string names to units referred to as ‘data rows’. Each such

data row usually consists of a numpy array. For example, for a time series this is a one-dimensional array. However, also higher dimensions are possible.

  • ‘metadata’: For this key, there is a dictionary containing meta information, for example the filename etc. For each data row in the ‘data’ dictionay, this metadata dictionary can also contain a corresponding dictionary of metadata under the same name that contains specific information about the specific data row, for example the sample time of a time series.

Methods

add_measurement

join_machine_dicts

join_with

join_with_by_measurement

dqa.data_representation.data_representation.ensure_and_get_measurement(datasets: Dict[str, Dataset], dataset_name: str, machine_name: str, measurement_index: int) Dict[str, Dict[str, Any]]#

Ensures that within an entire dataset dictionary, a measurement with a specified identifier exists. This measurement is also returned.

Parameters:
  • datasets – Dictionary of datasets.

  • dataset_name – Name of the dataset the measurement should be contained in.

  • machine_name – Name of the machine the measurement should be contained in.

  • measurement_index – Index of the measurement within the machine.

Return type:

The specified measurement. If it does not exist yet, it will be created.