dqa.connectors.importer.DataImporter#
- class dqa.connectors.importer.DataImporter(root_dir: str = '.', meta_info: dict | None = None, folders: List[List[str | float | None]] | None = None, filename: str | List[str] | None = None, files_per_folder: int | None = None, output_dataset: str = 'Data', machine_name: str = 'M0', filename_regex: str | None = None, batch_id_name: str | None = None, filename_in_dataset: str | None = None, abs_path: bool = False, **kwargs)#
Abstract base class for a data importer. By default, it works with input files. However, the keys such as folder and filename can also be used with other sources, e.g. as keys in a database.
- Parameters:
root_dir (str, default='.') – The root directory of the input files.
meta_info (dict, default=None) – A dictionary of metadata that can be added to the metadata of the imported data.
folders (list of lists) – List of folders to be imported. Each imported folder is given by a list of the form [folder_name, start, end, step]. Then in the folder root_dir/folder_name, the files [start:end:step] among the available files (that match the ending) are imported. The start, end, step entries are optional.
filename (str or list of str, default=None) – If specified, then inside each folder, exactly these files are imported. The start, end, … fields in the folders variable are ignored.
files_per_folder (int, default=None) – Number of files to import from each folder. If specified, then the step value for each folder parameter is overwritten accordingly.
output_dataset (str, default='Data') – The name of the dataset to be generated.
machine_name (str, default='M0') – The name of the machine in the resulting data structure.
filename_regex (str, default=None) – If set, then the files to be imported are determined by their filename matching this regular expression. The matched values for named groups in this regular expression are passed to the import_file function. By matching a group named ‘Machine’, the name of the output machine can be set.
batch_id_name (str, default=None) – If set, then the parameter with this name (from the groups parsed using the filename_regex) is used as the batch ID in the case that batch processing is enabled.
filename_in_dataset (str, default=None) – If specified and the ‘filename’ parameter is not, then the names of the files to load within each folders are taken from an array in the current dataset. It is specified as <Dataset Name>/<Machine Name>/<Measurement Index> /<’data’ or ‘metadata’>/<Data Row Name>.
Methods
filter_filenames
(files, start, end, step)For a list of available filenames, take a subset specified by start, end, step.
filter_filenames_regex
(files)Filter filenames by the regular expression specified in the constructor.
finish
()Can perform actions that are required to clean up after the task has finished, e.g. close network connections etc.
get_import_parameters
(folder_index)Returns filename, folder_name, start, end, step for the folder with index folder_index.
get_start_end_step
(start, end, step, ...)Adjusts start, end, step if the num_files parameter is set.
import_file
(folder, filename, parameters, ...)Imports a single file.
import_files
(folder, filenames, parameters)Imports all files given by a list of filenames.
import_folder
(folder[, start, end, step, ...])Imports all files from a folder.
list_folder
(folder)Lists all files in a folder. By default, this lists the files on the file system that are relevant (by checking file ending or regex pattern). For other purposes this could also be used, for example, to list the contents of a database. :param folder: The name of the folder to be listed. :return: A tuple containing: - A list of all relevant filenames. - A list containing a dictionary of paramters for each filename. By default, these are obtained by parsing the named groups in the filename regex.
import_file2
in_out_default
input_output_dataset
input_output_machine
input_output_mode
input_output_name
list_batch_ids
list_total
log
modify_data_row
modify_dataset
modify_dataset_dict
modify_machine
modify_measurement
set_logging_level
transfer_metadata