dqa.connectors.importer.DataImporter#

class dqa.connectors.importer.DataImporter(root_dir: str = '.', meta_info: dict | None = None, folders: List[List[str | float | None]] | None = None, filename: str | List[str] | None = None, files_per_folder: int | None = None, output_dataset: str = 'Data', machine_name: str = 'M0', filename_regex: str | None = None, batch_id_name: str | None = None, filename_in_dataset: str | None = None, abs_path: bool = False, **kwargs)#

Abstract base class for a data importer. By default, it works with input files. However, the keys such as folder and filename can also be used with other sources, e.g. as keys in a database.

Parameters:
  • root_dir (str, default='.') – The root directory of the input files.

  • meta_info (dict, default=None) – A dictionary of metadata that can be added to the metadata of the imported data.

  • folders (list of lists) – List of folders to be imported. Each imported folder is given by a list of the form [folder_name, start, end, step]. Then in the folder root_dir/folder_name, the files [start:end:step] among the available files (that match the ending) are imported. The start, end, step entries are optional.

  • filename (str or list of str, default=None) – If specified, then inside each folder, exactly these files are imported. The start, end, … fields in the folders variable are ignored.

  • files_per_folder (int, default=None) – Number of files to import from each folder. If specified, then the step value for each folder parameter is overwritten accordingly.

  • output_dataset (str, default='Data') – The name of the dataset to be generated.

  • machine_name (str, default='M0') – The name of the machine in the resulting data structure.

  • filename_regex (str, default=None) – If set, then the files to be imported are determined by their filename matching this regular expression. The matched values for named groups in this regular expression are passed to the import_file function. By matching a group named ‘Machine’, the name of the output machine can be set.

  • batch_id_name (str, default=None) – If set, then the parameter with this name (from the groups parsed using the filename_regex) is used as the batch ID in the case that batch processing is enabled.

  • filename_in_dataset (str, default=None) – If specified and the ‘filename’ parameter is not, then the names of the files to load within each folders are taken from an array in the current dataset. It is specified as <Dataset Name>/<Machine Name>/<Measurement Index> /<’data’ or ‘metadata’>/<Data Row Name>.

Methods

filter_filenames(files, start, end, step)

For a list of available filenames, take a subset specified by start, end, step.

filter_filenames_regex(files)

Filter filenames by the regular expression specified in the constructor.

finish()

Can perform actions that are required to clean up after the task has finished, e.g. close network connections etc.

get_import_parameters(folder_index)

Returns filename, folder_name, start, end, step for the folder with index folder_index.

get_start_end_step(start, end, step, ...)

Adjusts start, end, step if the num_files parameter is set.

import_file(folder, filename, parameters, ...)

Imports a single file.

import_files(folder, filenames, parameters)

Imports all files given by a list of filenames.

import_folder(folder[, start, end, step, ...])

Imports all files from a folder.

list_folder(folder)

Lists all files in a folder. By default, this lists the files on the file system that are relevant (by checking file ending or regex pattern). For other purposes this could also be used, for example, to list the contents of a database. :param folder: The name of the folder to be listed. :return: A tuple containing: - A list of all relevant filenames. - A list containing a dictionary of paramters for each filename. By default, these are obtained by parsing the named groups in the filename regex.

import_file2

in_out_default

input_output_dataset

input_output_machine

input_output_mode

input_output_name

list_batch_ids

list_total

log

modify_data_row

modify_dataset

modify_dataset_dict

modify_machine

modify_measurement

set_logging_level

transfer_metadata