Internal data format (dqa.data_representation.data_representation
)#
Internal data representation format for the DQA.
The internal data format follows a hierarchical structure distributing the data in the following units: Datasets -> Machines -> Measurements -> Data Rows
Data Rows are usually represented by numpy arrays, often representing time series. They can also be arrays of higher order (e.g. a three-phase current). In this case the axis 0 should correspond to the time dimension if there is one.
Measurements are dictionaries containing nested dictionaries for two keywords: ‘data’ and ‘metadata’. The ‘data’ sub-dictionary contains all data rows by their name. Note that not all these data rows need to have the same dimensions. The ‘metadata’ sub-dictionary may contain additional information about the measurement. For each data row that is contained in the ‘data’ dictionary, there can be a separate metadata dictionary in measurement[‘metadata’] under the same name.
A Machine contains a list of measurements. Usually all the included data belong to the same part of the data source (e.g. the same engine etc.). It is represented by the Machine class.
A Dataset contains a dictionary mapping to each machine by its name.
- class dqa.data_representation.data_representation.Dataset(machines: Dict[str, Machine], index: int = 0, total: int = 1)#
This class represents a dataset, whose content is a dictionary mapping strings to machines.
Methods
join_datasets
join_datasets_by_measurement
keys
- class dqa.data_representation.data_representation.Machine(measurements: List | None = None)#
This class represents a machine, which contains a list of measurements.
Each measurement is a dictionary containing two keys: - ‘data’: For this key, there is a dictionary that maps string names to units referred to as ‘data rows’. Each such
data row usually consists of a numpy array. For example, for a time series this is a one-dimensional array. However, also higher dimensions are possible.
‘metadata’: For this key, there is a dictionary containing meta information, for example the filename etc. For each data row in the ‘data’ dictionay, this metadata dictionary can also contain a corresponding dictionary of metadata under the same name that contains specific information about the specific data row, for example the sample time of a time series.
Methods
add_measurement
join_machine_dicts
join_with
join_with_by_measurement
- dqa.data_representation.data_representation.ensure_and_get_measurement(datasets: Dict[str, Dataset], dataset_name: str, machine_name: str, measurement_index: int) Dict[str, Dict[str, Any]] #
Ensures that within an entire dataset dictionary, a measurement with a specified identifier exists. This measurement is also returned.
- Parameters:
datasets – Dictionary of datasets.
dataset_name – Name of the dataset the measurement should be contained in.
machine_name – Name of the machine the measurement should be contained in.
measurement_index – Index of the measurement within the machine.
- Return type:
The specified measurement. If it does not exist yet, it will be created.