The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

EAI::Wrap - framework for easy creation of Enterprise Application Integration tasks

SYNOPSIS

    # site.config
    %config = (
        sensitive => {
                        dbSys => {user => "DBuser", pwd => "DBPwd"},
                        ftpSystem1 => {user => "FTPuser", pwd => "FTPPwd", privKey => 'path_to_private_key', hostkey =>'hostkey'},
                },
        checkLookup => {"task_script.pl" => {errmailaddress => "test\@test.com", errmailsubject => "testjob failed", timeToCheck => "0800", freqToCheck => "B", logFileToCheck => "test.log", logcheck => "started.*"}},
        executeOnInit => sub {$execute{addToScriptName} = "doWhateverHereToModifySettings";},
        folderEnvironmentMapping => {Test => "Test", Dev => "Dev", "" => "Prod"},
        errmailaddress => 'your@mail.address',
        errmailsubject => "No errMailSubject defined",
        fromaddress => 'service@mail.address',
        smtpServer => "a.mail.server",
        smtpTimeout => 60,
        testerrmailaddress => 'your@mail.address',
        logRootPath => {"" => "C:/dev/EAI/Logs",},
        historyFolder => {"" => "History",},
        historyFolderUpload => "HistoryUpload",
        redoDir => {"" => "redo",},
        task => {
                redoTimestampPatternPart => '[\d_]',
                retrySecondsErr => 60*5,
                retrySecondsErrAfterXfails => 60*10,
                retrySecondsXfails => 2,
                retrySecondsPlanned => 60*15,
        },
        DB => {
                server => {Prod => "ProdServer", Test => "TestServer"},
                cutoffYr2000 => 60,
                DSN => 'driver={SQL Server};Server=$DB->{server}{$execute{env}};database=$DB->{database};TrustedConnection=Yes;',
                schemaName => "dbo",
        },
        FTP => {
                lookups => {
                        ftpSystem1 => {remoteHost => {Test => "TestHost", Prod => "ProdHost"}, port => 5022},
                },
                maxConnectionTries => 5,
                sshInstallationPath => "C:/dev/EAI/putty/PLINK.EXE",
        },
        File => {
                format_defaultsep => "\t",
                format_thousandsep => ",",
                format_decimalsep => ".",
        }
    );

    # task_script.pl
    use EAI::Wrap;
    %common = (
        FTP => {
                remoteHost => {"Prod" => "ftp.com", "Test" => "ftp-test.com"},
                remoteDir => "/reports",
                port => 22,
                user => "myuser",
                privKey => 'C:/keystore/my_private_key.ppk',
                FTPdebugLevel => 0, # ~(1|2|4|8|16|1024|2048)
        },
        DB => {
                tablename => "ValueTable",
                deleteBeforeInsertSelector => "rptDate = ?",
                dontWarnOnNotExistingFields => 1,
                database => "DWH",
        },
        task => {
                plannedUntil => "2359",
        },
    );
    @loads = (
        {
                File => {
                        filename => "Datafile1.XML",
                        format_XML => 1,
                        format_sep => ',',
                        format_xpathRecordLevel => '//reportGrp/CM1/*',
                        format_fieldXpath => {rptDate => '//rptHdr/rptDat', NotionalVal => 'NotionalVal', tradeRef => 'tradeRefId', UTI => 'UTI'}, 
                        format_header => "rptDate,NotionalVal,tradeRef,UTI",
                },
        },
        {
                File => {
                        filename => "Datafile2.txt",
                        format_sep => "\t",
                        format_skip => 1,
                        format_header => "rptDate       NotionalVal     tradeRef        UTI",
                },
        }
    );
    setupEAIWrap();
    standardLoop();

DESCRIPTION

EAI::Wrap provides a framework for defining EAI jobs directly in Perl, sparing the creator of low-level tasks as FTP-Fetching, file-parsing and storing into a database. It also can be used to handle other workflows, like creating files from the database and uploading to FTP-Servers or using other externally provided tools.

The definition is done by first setting up datastructures for configurations and then providing a high-level scripting of the job itself using the provided subs (although any perl code is welcome here!).

EAI::Wrap has a lot of infrastructure already included, like logging using Log4perl, database handling with DBI and DBD::ODBC, FTP services using Net::SFTP::Foreign, file parsing using Text::CSV (text files), Data::XLSX::Parser and Spreadsheet::ParseExcel (excel files), XML::LibXML (xml files), file writing with Spreadsheet::WriteExcel and Excel::Writer::XLSX (excel files), Text::CSV (text files).

Furthermore it provides very flexible commandline options, allowing almost all configurations to be set on the commandline. Commandline options (e.g. additional information passed on with the interactive option) of the task script are fetched at INIT allowing use of options within the configuration, e.g. $opt{process}{interactive_startdate} for a passed start date.

Also the logging configured in $ENV{EAI_WRAP_CONFIG_PATH}/log.config (logfile root path set in $ENV{EAI_WRAP_CONFIG_PATH}/site.config) starts immediately at INIT of the task script, to use a logger, simply make a call to get_logger(). For the logging configuration, see EAI::Common, setupLogging.

There are two accompanying scripts:

setDebugLevel.pl to easily modify the configured log-levels of the task-script itself and all EAI-Wrap modules.

checkLogExist.pl to run checks on the produced logs (at given times using a cron-job or other scheduler) for their existence and certain (starting/finishing) entries, giving error notifications if the check failed.

API: datastructures for configurations

%config

global config (set in $ENV{EAI_WRAP_CONFIG_PATH}/site.config, amended with $ENV{EAI_WRAP_CONFIG_PATH}/additional/*.config), contains special parameters (default error mail sending, logging paths, etc.) and site-wide pre-settings for the five categories in task scripts, described below under configuration categories)

%common

common configs for the task script, may contain one configuration hash for each configuration category.

@loads

list of hashes defining specific load processes within the task script. Each hash may contain one configuration hash for each configuration category.

configuration categories

In the above mentioned hashes can be five categories (sub-hashes): DB, File, FTP, process and task. These allow further parameters to be set for the respective parts of EAI::Wrap (EAI::DB, EAI::File and EAI::FTP), process parameters and task parameters. The parameters are described in detail in section CONFIGURATION REFERENCE.

The process category is on the one hand used to pass information within each process (data, additionalLookupData, filenames, hadErrors or custom commandline parameters starting with interactive), on the other hand for additional configurations not suitable for DB, File or FTP (e.g. uploadCMD). The task category contains parameters used on the task script level and is therefore only allowed in %config and %common. It contains parameters for skipping, retrying and redoing the whole task script.

The settings in DB, File, FTP and task are "merge" inherited in a cascading manner (i.e. missing parameters are merged, parameters already set below are not overwritten):

 - %config (defined in site.config and other associated configs. This is being loaded at INIT)
 merged into ->
 - %common (common task parameters defined in script. This is being loaded when calling setupEAIWrap())
 merged into each instance of ->
 - $loads[] (only if loads are defined, you can also stay with %common if there is only one load in the script)

special config parameters and DB, FTP, File, task parameters from command line options are merged at the respective level (config at the top, the rest at the bottom) and always override any set parameters. Only scalar parameters can be given on the command line, no lists and hashes are possible. Commandline options are given in the format:

  --<category> <parameter>=<value>

for the common level and

  --load<i><category> <parameter>=<value>

for the loads level.

Command line options are also available to the script via the hash %opt or the list of hashes @optloads, so in order to access the cmdline option --process interactive_date=202300101 you could either use $common{process}{interactive_date} or $opt{process}{interactive_date}.

In order to use --load1process interactive_date=202300101, you would use $loads[1]{process}{interactive_date} or $optloads[1]{process}{interactive_date}.

The merge inheritance for DB, FTP, File and task can be prevented by using an underscore after the hashkey, ie. DB_, FTP_, File_ and task_. In this case the parameters are not merged from common. However, they are always inherited from config.

A special merge is done for configurations defined in hash lookups, which may appear in all five categories (sub-hashes) of the top-level configuration %config. This uses the prefix defined in the task script's %common configuration to get generally defined settings for this specific prefix. As an example, common remoteHosts or ports for FTP can be defined here. These settings also allow an environment dependent hash, like {Test => "TestHost", Prod => "ProdHost"}.

%execute

hash of parameters for current task execution, which is not set by the user but can be read to set other parameters and control the flow. Most important here are $execute{env}, giving the current used environment (Prod, Test, Dev, whatever), $execute{envraw} (same as $execute{env}, with Production being empty here), the several file lists (files being procesed, files for deletion/moving, etc.), flags for ending/interrupting processing and directory locations as the home dir and history folders for processed files.

Detailed information about these parameters can be found in section execute of the configuration parameter reference, there are parameters for files (filesProcessed, filesToDelete, filesToMoveinHistory, filesToMoveinHistoryUpload, retrievedFiles) and uploadFilesToDelete, directories (homedir, historyFolder, historyFolderUpload and redoDir), process controlling parameters (failcount, firstRunSuccess, retryBecauseOfError, retrySeconds and processEnd).

Retrying after $execute{processEnd} is false (this parameter is set during processingEnd(), combining this call and check can be done in loop header at start with processingContinues()) can happen because of two reasons: First, due to task => {plannedUntil => "HHMM"} being set to a time until the task has to be retried, however this is done at most until midnight. Second, because an error occurred, in such a case $process->{hadErrors} is set for each load that failed. $process{successfullyDone} is also important in this context as it prevents the repeated run of following API procedures if the loads didn't have an error during their execution:

openDBConn, openFTPConn, getLocalFiles, getFilesFromFTP, getFiles, extractArchives, getAdditionalDBData, readFileData, dumpDataIntoDB, writeFileFromDB, putFileInLocalDir, uploadFileToFTP, uploadFileCMD, and uploadFile.

checkFiles is always run, regardless of $process{successfullyDone}.

After the first successful run of the task, $execute{firstRunSuccess} is set to prevent any error messages resulting of files having been moved/removed while rerunning the task until the defined planned time (task => {plannedUntil => "HHMM"}) has been reached.

initialization

The INIT procedure is executed at the task script initialization (when EAI::Wrap is "use"d in the task script) and loads the site configuration, starts logging and reads commandline options. This means that everything passed to the script via command line may be used in the definitions, especially the task{interactive.*} parameters, here the name and the type of the parameter are not checked by the consistency checks (other parameters that are not allowed or have the wrong type throw an error). The task script's configuration itself is then read with setupEAIWrap(), which is usually called immediately after the datastructures for configurations have been finished.

API: High-level subs

Following are the high level subs that can be called for a standard workflow. Most of them accumulate their sub names in process{successfullyDone} to prevent any further call in a faulting loop, when they alrady ran successfully. Also process{hadErrors} is set in case of errors to provide for error repeating. Downloaded files are collected in process{filenames} and completely processed files in process{filesProcessed}.

setupEAIWrap

setupEAIWrap is actually imported from EAI::Common, but as it is usually called as the first sub, it is mentioned here as well. This sub sets up the configuration datastructure and merges the hierarchy of configurations, more information in EAI::Common::setupEAIWrap.

removeFilesinFolderOlderX

Usually done for clearing FTP archives, this removes files on FTP server being older than a time back (given in day/mon/year in remove => {removeFolders => ["",""], day=>, mon=>, year=>1}), see EAI::FTP::removeFilesOlderX (always runs in a faulting loop)

openDBConn ($)

argument $arg (ref to current load or common)

open a DB connection with the information provided in $DB->{user}, $DB->{pwd} (these can be provided by the sensitive information looked up using $DB->{prefix}) and $DB->{DSN} which can be dynamically configured using information from $DB itself, using $execute{env} inside $DB->{server}{*}: 'driver={SQL Server};Server=$DB->{server}{$execute{env}};database=$DB->{database};TrustedConnection=Yes;', also see EAI::DB::newDBH

If the DSN information is not found in $DB then a system wide DSN for the set $DB{prefix} is tried to be fetched from $config{DB}{$DB{prefix}}{DSN}. This also respects environment information in $execute{env} if configured.

openFTPConn ($)

argument $arg (ref to current load or common)

open a FTP connection with the information provided in $FTP->{remoteHost}, $FTP->{user}, $FTP->{pwd}, $FTP->{hostkey}, $FTP->{privKey} (these four can be provided by the sensitive information looked up using $FTP->{prefix}) and $execute{env}, also see EAI::FTP::login

If the remoteHost information is not found in $FTP then a system wide remoteHost for the set $FTP{prefix} is tried to be fetched from $config{FTP}{$FTP{prefix}}{remoteHost}. This also respects environment information in $execute{env} if configured.

redoFiles ($)

argument $arg (ref to current load or common)

redo file from redo directory if specified ($common{task}{redoFile} is being set), this is also being called by getLocalFiles and getFilesFromFTP. Arguments are fetched from common or loads[i], using File parameter. (always runs in a faulting loop when called directly)

getLocalFiles ($)

argument $arg (ref to current load or common)

get local file(s) from source into homedir, checks files for continuation of processing and extract archives if needed. Arguments are fetched from common or loads[i], using File parameter. The processed files are put into process->{filenames} (always runs in a faulting loop). Uses $File->{filename}, $File->{extension} and $File->{avoidRenameForRedo}.

getFilesFromFTP ($)

argument $arg (ref to current load or common)

get file/s (can also be a glob for multiple files) from FTP into homedir, checks files for continuation of processing and extract archives if needed. Arguments are fetched from common or loads[i], using File and FTP parameters. The processed files are put into process->{filenames} (always runs in a faulting loop).

getFiles ($)

argument $arg (ref to current load or common)

combines above two procedures in a general procedure to get files from FTP or locally. Arguments are fetched from common or loads[i], using File and FTP parameters.

checkFiles ($)

argument $arg (ref to current load or common)

check files for continuation of processing and extract archives if needed. Arguments are fetched from common or loads[i], using File parameter. The processed files are put into process->{filenames} (always runs in a faulting loop). Important: files (their filenames) not retrieved by getFilesFromFTP or getLocalFiles have to be put into $execute{retrievedFiles} (e.g. push @{$execute{retrievedFiles}}, $filenameTobeChecked)!

extractArchives ($)

argument $arg (ref to current load or common)

extract files from archive (only one archive is allowed). Arguments are fetched from common or loads[i], using only the process->{filenames} parameter that was filled by checkFiles. If not being called by getFilesFromFTP/getLocalFiles and checkFiles @{$process{filenames}} has to contain the archive filename.

getAdditionalDBData ($;$)

arguments $arg (ref to current load or common) and optional $refToDataHash

get additional data from DB. Arguments are fetched from common or loads[i], using DB and process parameters. You can also pass an optional ref to a data hash parameter to store the retrieved data there instead of $process-{additionalLookupData}>

readFileData ($)

argument $arg (ref to current load or common)

read data from a file. Arguments are fetched from common or loads[i], using File parameter. This parses the file content into the datastructure process{data}. Custom "hooks" can be defined with fieldCode and lineCode to modify and enhance the standard mapping defined in format_header. To access the final line data the hash %EAI::File::line can be used (specific fields with $EAI::File::line{<target header column>}). if a field is being replaced using a different name from targetheader, the data with the original header name is placed in %EAI::File::templine. You can also access data from the previous line with %EAI::File::previousline and the previous temp line with %EAI::File::previoustempline.

dumpDataIntoDB ($)

argument $arg (ref to current load or common)

store data into Database. Arguments are fetched from common or loads[i], using DB and File (for emptyOK) parameters.

markProcessed ($)

argument $arg (ref to current load or common)

mark files as being processed depending on whether there were errors, also decide on removal/archiving of downloaded files. Arguments are fetched from common or loads[i], using File parameter. (always runs in a faulting loop)

writeFileFromDB ($)

argument $arg (ref to current load or common)

create data-files (excel or text) from Database. Arguments are fetched from common or loads[i], using DB and File parameters.

writeFileFromMemory ($$)

arguments $arg (ref to current load or common) and $data (ref to array of hash values coming from readFromDB or readText/readExcel/readXML)

create data-files (excel or text) from memory stored array of hash values. The created (in case of text files also appended) file information is taken from $arg, the data from $data.

putFileInLocalDir ($)

argument $arg (ref to current load or common)

put files into local folder if required. Arguments are fetched from common or loads[i], using File parameter.

markForHistoryDelete ($)

argument $arg (ref to current load or common)

mark to be removed or be moved to history after upload. Arguments are fetched from common or loads[i], using File parameter. (always runs in a faulting loop)

uploadFileToFTP ($)

argument $arg (ref to current load or common)

upload files to FTP. Arguments are fetched from common or loads[i], using FTP and File parameters.

uploadFileCMD ($)

argument $arg (ref to current load or common)

upload files using an upload command program. Arguments are fetched from common or loads[i], using File and process parameters.

uploadFile ($)

argument $arg (ref to current load or common)

combines above two procedures in a general procedure to upload files via FTP or CMD or to put into local dir. Arguments are fetched from common or loads[i], using File and process parameters

standardLoop (;$)

executes the given configuration in a standard extract/transform/load loop (as shown below), depending on whether loads are given an additional loop is done for all loads within the @loads list. If the definition only contains the common hash then there is no loop. The additional optional parameter $getAddtlDBData activates getAdditionalDBData before reading in file data. No other processing is possible (creating files from data, uploading, etc.)

  while (processingContinues()) {
        if ($common{DB}{DSN}) {
                openDBConn(\%common,1) or $logger->error("failed opening DB connection");
        }
        if ($common{FTP}{remoteHost}) {
                openFTPConn(\%common,1) or $logger->error("failed opening FTP connection");
        }
        if (@loads) {
                for my $load (@loads) {
                        if (getFiles($load)) {
                                getAdditionalDBData($load) if $getAddtlDBData;
                                readFileData($load);
                                dumpDataIntoDB($load);
                                markProcessed($load);
                        }
                }
        } else {
                if (getFiles(\%common)) {
                        getAdditionalDBData(\%common) if $getAddtlDBData;
                        readFileData(\%common);
                        dumpDataIntoDB(\%common);
                        markProcessed(\%common);
                }
        }
  }
processingEnd

final processing steps for process ending (cleanup, FTP removal/archiving) or retry after pausing. No context argument as this always depends on all loads and/or the common definition (always runs in a faulting loop). Returns true if process ended and false if not. Using this as a check also works for do .. while or do .. until loops.

processingPause ($)

generally available procedure for pausing processing, argument $pauseSeconds gives the delay

processingContinues

Alternative and compact way to combine call to processingEnd() and check of $execute{processEnd} in one go in a while or until loop header. Returns true if process continues and false if not. Caveat: This doesn't works for do .. while or do .. until loops! Instead of checking processingEnd() and processingContinues(), a check of !$execute{processEnd} can be done in the while or until header with a call to processingEnd() at the end of the loop.

moveFilesToHistory (;$)

optional argument $archiveTimestamp

move transferred files marked for moving (filesToMoveinHistory/filesToMoveinHistoryUpload) into history and/or historyUpload folder. Optionally a custom timestamp can be passed.

deleteFiles ($)

argument $filenames, ref to array

delete transferred files given in $filenames

CONFIGURATION REFERENCE

config

parameter category for site global settings, usually defined in site.config and other associated configs loaded at INIT

checkLogExistDelay

ref to hash {Test => 2, Dev => 3, "" => 0}, mapping to set delays for checkLogExist per environment in $execute{env}, this can be further overriden per job (and environment) in checkLookup.

checkLookup

ref to datastructure {"scriptname.pl + optional addToScriptName" => {errmailaddress => "",errmailsubject => "",timeToCheck =>"", freqToCheck => "", logFileToCheck => "", logcheck => "",logRootPath =>""},...} used for logchecker, each entry of the hash lookup table defines a log to be checked, defining errmailaddress to receive error mails, errmailsubject, timeToCheck as earliest time to check for existence in log, freqToCheck as frequency of checks (daily/monthly/etc), logFileToCheck as the name of the logfile to check, logcheck as the regex to check in the logfile and logRootPath as the folder where the logfile is found. lookup key: $execute{scriptname} + $execute{addToScriptName}

errmailaddress

default mail address for central logcheck/errmail sending

errmailsubject

default mail subject for central logcheck/errmail sending

executeOnInit

code to be executed during INIT of EAI::Wrap to allow for assignment of config/execute parameters from commandline params BEFORE Logging!

folderEnvironmentMapping

ref to hash {Test => "Test", Dev => "Dev", "" => "Prod"}, mapping for $execute{envraw} to $execute{env}

fromaddress

from address for central logcheck/errmail sending, also used as default sender address for sendGeneralMail

historyFolder

ref to hash {"scriptname.pl + optional addToScriptName" => "folder"}, folders where downloaded files are historized, lookup key as in checkLookup, default in "" => "defaultfolder". historyFolder, historyFolderUpload, logRootPath and redoDir are always built with an environment subfolder, the default is built as folderPath/endFolder/environ, otherwise it is built as folderPath/environ/endFolder. Environment subfolders (environ) are also built depending on prodEnvironmentInSeparatePath: either folderPath/endFolder/$execute{env} (prodEnvironmentInSeparatePath = true, Prod has own subfolder) or folderPath/endFolder/$execute{envraw} (prodEnvironmentInSeparatePath = false, Prod is in common folder, other environments have their own folder)

historyFolderUpload

ref to hash {"scriptname.pl + optional addToScriptName" => "folder"}, folders where uploaded files are historized, lookup key as in checkLookup, default in "" => "defaultfolder"

logCheckHoliday

calendar for business days in central logcheck/errmail sending. builtin calendars are AT (Austria), TG (Target), UK (United Kingdom) and WE (for only weekends). Calendars can be added with EAI::DateUtil::addCalendar

logs_to_be_ignored_in_nonprod

regular expression to specify logs to be ignored in central logcheck/errmail sending

logprefixForLastLogfile

prefix for previous (day) logs to be set in error mail (link), if not given, defaults to get_curdate(). In case Log::Dispatch::FileRotate is used as the File Appender in Log4perl config, the previous log is identified with <logname>.1

logRootPath

ref to hash {"scriptname.pl + optional addToScriptName" => "folder"}, paths to log file root folders (environment is added to that if non production), lookup key as checkLookup, default in "" => "defaultfolder"

prodEnvironmentInSeparatePath

set to 1 if the production scripts/logs etc. are in a separate Path defined by folderEnvironmentMapping (prod=root/Prod, test=root/Test, etc.), set to 0 if the production scripts/logs are in the root folder and all other environments are below that folder (prod=root, test=root/Test, etc.)

redoDir

ref to hash {"scriptname.pl + optional addToScriptName" => "folder"}, folders where files for redo are contained, lookup key as checkLookup, default in "" => "defaultfolder"

sensitive

hash lookup table ({"prefix" => {user=>"",pwd =>"",hostkey=>"",privkey =>""},...}) for sensitive access information in DB and FTP (lookup keys are set with DB{prefix} or FTP{prefix}), may also be placed outside of site.config; all sensitive keys can also be environment lookups, e.g. hostkey=>{Test => "", Prod => ""} to allow for environment specific setting

smtpServer

smtp server for den (error) mail sending

smtpTimeout

timeout for smtp response

testerrmailaddress

error mail address in non prod environment

execute

hash of parameters for current task execution. This is not to be set by the user, but can be used to as information to set other parameters and control the flow

alreadyMovedOrDeleted

hash for checking the already moved or deleted local files, to avoid moving/deleting them again at cleanup

addToScriptName

this can be set to be added to the scriptname for config{checkLookup} keys, e.g. some passed parameter.

env

Prod, Test, Dev, whatever is defined as the lookup value in folderEnvironmentMapping. homedir as fetched from the File::basename::dirname of the executing script using /^.*[\\\/](.*?)$/ is used as the key for looking up this value.

envraw

Production has a special significance here as being an empty string. Otherwise like env.

errmailaddress

target address for central logcheck/errmail sending in current process

errmailsubject

mail subject for central logcheck/errmail sending in current process

failcount

for counting failures in processing to switch to longer wait period or finish altogether

filesToDelete

list of files to be deleted locally after download, necessary for cleanup at the end of the process

filesToMoveinHistory

list of files to be moved in historyFolder locally, necessary for cleanup at the end of the process

filesToMoveinHistoryUpload

list of files to be moved in historyFolderUpload locally, necessary for cleanup at the end of the process

firstRunSuccess

for planned retries (process=>plannedUntil filled) -> this is set after the first run to avoid error messages resulting of files having been moved/removed.

freqToCheck

for logchecker: frequency to check entries (B,D,M,M1) ...

homedir

the home folder of the script, mostly used to return from redo and other folders for globbing files.

historyFolder

actually set historyFolder

historyFolderUpload

actually set historyFolderUpload

logcheck

for logchecker: the Logcheck (regex)

logFileToCheck

for logchecker: Logfile to be searched

logRootPath

actually set logRootPath

processEnd

specifies that the process is ended, checked in EAI::Wrap::processingEnd

redoDir

actually set redoDir

retrievedFiles

files retrieved from FTP or redo directory

retryBecauseOfError

retryBecauseOfError shows if a rerun occurs due to errors (for successMail)

retrySeconds

how many seconds are passed between retries. This is set on error with process=>retrySecondsErr and if planned retry is defined with process=>retrySecondsPlanned

scriptname

name of the current process script, also used in log/history setup together with addToScriptName for config{checkLookup} keys

timeToCheck

for logchecker: scheduled time of job (don't look earlier for log entries)

uploadFilesToDelete

list of files to be deleted locally after upload, necessary for cleanup at the end of the process

DB

DB specific configs

addID

this hash can be used to additionaly set a constant to given fields: Fieldname => Fieldvalue

additionalLookup

query used in getAdditionalDBData to retrieve lookup information from DB using EAI::DB::readFromDBHash

additionalLookupKeys

used for getAdditionalDBData, list of field names to be used as the keys of the returned hash

cutoffYr2000

when storing date data with 2 year digits in dumpDataIntoDB/EAI::DB::storeInDB, this is the cutoff where years are interpreted as 19XX (> cutoffYr2000) or 20XX (<= cutoffYr2000)

columnnames

returned column names from EAI::DB::readFromDB and EAI::DB::readFromDBHash, this is used in writeFileFromDB to pass column information from database to writeText

database

database to be used for connecting

debugKeyIndicator

used in dumpDataIntoDB/EAI::DB::storeInDB as an indicator for keys for debugging information if primkey not given (errors are shown with this key information). Format is the same as for primkey

deleteBeforeInsertSelector

used in dumpDataIntoDB/EAI::DB::storeInDB to delete specific data defined by keydata before an insert (first occurrence in data is used for key values). Format is the same as for primkey ("key1 = ? ...")

dontWarnOnNotExistingFields

suppress warnings in dumpDataIntoDB/EAI::DB::storeInDB for not existing fields

dontKeepContent

if table should be completely cleared before inserting data in dumpDataIntoDB/EAI::DB::storeInDB

doUpdateBeforeInsert

invert insert/update sequence in dumpDataIntoDB/EAI::DB::storeInDB, insert only done when upsert flag is set

DSN

DSN String for DB connection

incrementalStore

when storing data with dumpDataIntoDB/EAI::DB::storeInDB, avoid setting empty columns to NULL

ignoreDuplicateErrs

ignore any duplicate errors in dumpDataIntoDB/EAI::DB::storeInDB

keyfields

used for EAI::DB::readFromDBHash, list of field names to be used as the keys of the returned hash

longreadlen

used for setting database handles LongReadLen parameter for DB connection, if not set defaults to 1024

lookups

similar to $config{sensitive}, a hash lookup table ({"prefix" => {remoteHost=>""},...} or {"prefix" => {remoteHost=>{Prod => "", Test => ""}},...}) for centrally looking up DSN Settings depending on $DB{prefix}. Overrides $DB{DSN} set in config, but is overriden by script-level settings in %common.

noDBTransaction

don't use a DB transaction for dumpDataIntoDB

noDumpIntoDB

if files from this load should not be dumped to the database

port

port to be added to server in environment hash lookup: {Prod => "", Test => ""}

postDumpExecs

array for DB executions done in dumpDataIntoDB after postDumpProcessing and before commit/rollback: [{execs => ['',''], condition => ''}]. For all execs a doInDB is executed if condition (evaluated string or anonymous sub: condition => sub {...}) is fulfilled

postDumpProcessing

done in dumpDataIntoDB after EAI::DB::storeInDB, execute perl code in postDumpProcessing (evaluated string or anonymous sub: postDumpProcessing => sub {...})

postReadProcessing

done in writeFileFromDB after EAI::DB::readFromDB, execute perl code in postReadProcessing (evaluated string or anonymous sub: postReadProcessing => sub {...})

prefix

key for sensitive information (e.g. pwd and user) in config{sensitive} or system wide DSN in config{DB}{prefix}{DSN}. respects environment in $execute{env} if configured.

primkey

primary key indicator to be used for update statements, format: "key1 = ? AND key2 = ? ...". Not necessary for dumpDataIntoDB/storeInDB if dontKeepContent is set to 1, here the whole table content is removed before storing

pwd

for password setting, either directly (insecure -> visible) or via sensitive lookup

query

query statement used for EAI::DB::readFromDB and EAI::DB::readFromDBHash

schemaName

schemaName used in dumpDataIntoDB/EAI::DB::storeInDB, if tableName contains dot the extracted schema from tableName overrides this. Needed for datatype information!

server

DB Server in environment hash lookup: {Prod => "", Test => ""}

tablename

the table where data is stored in dumpDataIntoDB/EAI::DB::storeInDB

upsert

in dumpDataIntoDB/EAI::DB::storeInDB, should both update and insert be done. doUpdateBeforeInsert=0: after the insert failed (because of duplicate keys) or doUpdateBeforeInsert=1: insert after the update failed (because of key not exists)?

user

for setting username in db connection, either directly (insecure -> visible) or via sensitive lookup

File

File fetching and parsing specific configs. File{filename} is also used for FTP

avoidRenameForRedo

when redoing, usually the cutoff (datetime/redo info) is removed following a pattern. set this flag to avoid this

append

for EAI::File::writeText: boolean to append (1) or overwrite (0 or undefined) to file given in filename

columns

for EAI::File::writeText: Hash of data fields, that are to be written (in order of keys)

columnskip

for EAI::File::writeText: boolean hash of column names that should be skipped when writing the file ({column1ToSkip => 1, column2ToSkip => 1, ...})

dontKeepHistory

if up- or downloaded file should not be moved into historyFolder but be deleted

dontMoveIntoHistory

if up- or downloaded file should not be moved into historyFolder but be kept in homedir

emptyOK

flag to specify whether empty files should not invoke an error message. Also needed to mark an empty file as processed in EAI::Wrap::markProcessed

extract

flag to specify whether to extract files from archive package (zip)

extension

the extension of the file to be read (optional, used for redoFile)

fieldCode

additional field based processing code: fieldCode => {field1 => 'perl code', ..}, invoked if key equals either header (as in format_header) or targetheader (as in format_targetheader) or invoked for all fields if key is empty {"" => 'perl code'}. set $EAI::File::skipLineAssignment to true (1) if current line should be skipped from data. perl code can be an evaluated string or an anonymous sub: field1 => sub {...}

filename

the name of the file to be read, can also be a glob spec to retrieve multiple files. This information is also used for FTP and retrieval and local file copying.

firstLineProc

processing done when reading the first line of text files in EAI::File::readText (used to retrieve information from a header line, like reference date etc.). The line is available in $_.

format_allowLinefeedInData

line feeds in values don't create artificial new lines/records, only works for csv quoted data in EAI::File::readText

format_autoheader

assumption: header exists in file and format_header should be derived from there. only for EAI::File::readText

format_beforeHeader

additional String to be written before the header in EAI::File::writeText

format_dateColumns

numeric array of columns that contain date values (special parsing) in excel files (EAI::File::readExcel)

format_decimalsep

decimal separator used in numbers of sourcefile (defaults to . if not given)

format_defaultsep

default separator when format_sep not given (usually in site.config), if no separator is given (not needed for EAI::File::readExcel/EAI::File::readXML), "\t" is used for parsing format_header and format_targetheader.

format_encoding

text encoding of the file in question (e.g. :encoding(utf8))

format_headerColumns

optional numeric array of columns that contain data in excel files (defaults to all columns starting with first column up to format_targetheader length)

format_header

format_sep separated string containing header fields (optional in excel files, only used to check against existing header row)

format_headerskip

skip until row-number for checking header row against format_header in EAI::File::readExcel

format_eol

for quoted csv specify special eol character (allowing newlines in values)

format_fieldXpath

for EAI::File::readXML, hash with field => xpath to content association entries

format_fix

for text writing, specify whether fixed length format should be used (requires format_padding)

format_namespaces

for EAI::File::readXML, hash with alias => namespace association entries

format_padding

for text writing, hash with field number => padding to be applied for fixed length format

format_poslen

array of array defining positions and lengths [[pos1,len1],[pos2,len2]...[posN,lenN]] of data in fixed length format text files (if format_sep == "fix")

format_quotedcsv

special parsing/writing of quoted csv data using Text::CSV

format_sep

separator string for EAI::File::readText and EAI::File::writeText csv formats, a regex for splitting other separated formats. If format_sep is not explicitly given as a regex here (=> qr//), then it is assumed to be a regex by split, however this causes surprising effects with regex metacharacters (should be quoted, such as qr/\|/)! Also used for splitting format_header and format_targetheader (Excel and XML-formats use tab as default separator here).

format_sepHead

special separator for header row in EAI::File::writeText, overrides format_sep

format_skip

either numeric or string, skip until row-number if numeric or appearance of string otherwise in reading textfile. If numeric, format_skip can also be used in EAI::File::readExcel

format_stopOnEmptyValueColumn

for EAI::File::readExcel, stop row parsing when a cell with this column number is empty (denotes end of data, to avoid very long parsing).

format_suppressHeader

for text and excel file writing, suppress output of header

format_targetheader

format_sep separated string containing target header fields (= the field names in target/database table). optional for XML and tabular textfiles, defaults to format_header if not given there.

format_thousandsep

thousand separator used in numbers of sourcefile (defaults to , if not given)

format_worksheetID

worksheet number for EAI::File::readExcel, this should always work

format_worksheet

alternatively the worksheet name can be passed for EAI::File::readExcel, this only works for new excel format (xlsx)

format_xlformat

excel format for parsing, also specifies that excel parsing should be done

format_xpathRecordLevel

xpath for level where data nodes are located in xml

format_XML

specify xml parsing

lineCode

additional line based processing code, invoked after whole line has been read (evaluated string or anonymous sub: lineCode => sub {...})

localFilesystemPath

if files are taken from or put to the local file system with getLocalFiles/putFileInLocalDir then the path is given here. Setting this to "." avoids copying files.

optional

to avoid error message for missing optional files, set this to 1

FTP

FTP specific configs

additionalParamsGet

additional parameters for Net::SFTP::Foreign get.

additionalMoreArgs

additional more args for Net::SFTP::Foreign new (args passed to ssh command).

additionalParamsNew

additional parameters for Net::SFTP::Foreign new.

additionalParamsPut

additional parameters for Net::SFTP::Foreign put.

archiveDir

folder for archived files on the FTP server

dontMoveTempImmediately

if 0 oder missing: rename/move files immediately after writing to FTP to the final name, otherwise/1: a call to EAI::FTP::moveTempFiles is required for that

dontDoSetStat

for Net::SFTP::Foreign, no setting of time stamp of remote file to that of local file (avoid error messages of FTP Server if it doesn't support this)

dontDoUtime

don't set time stamp of local file to that of remote file

dontUseQuoteSystemForPwd

for windows, a special quoting is used for passing passwords to Net::SFTP::Foreign that contain [()"<>& . This flag can be used to disable this quoting.

dontUseTempFile

directly upload files, without temp files

fileToArchive

should files be archived on FTP server? if archiveDir is not set, then file is archived (rolled) in the same folder

fileToRemove

should files be removed on FTP server?

FTPdebugLevel

debug ftp: 0 or ~(1|2|4|8|16|1024|2048), loglevel automatically set to debug for module EAI::FTP

hostkey

hostkey to present to the server for Net::SFTP::Foreign, either directly (insecure -> visible) or via sensitive lookup

hostkey2

additional hostkey to be presented (e.g. in case of round robin DNS)

localDir

optional: local folder for files to be placed, if not given files are downloaded into current folder

lookups

similar to $config{sensitive}, a hash lookup table ({"prefix" => {remoteHost=>""},...} or {"prefix" => {remoteHost=>{Prod => "", Test => ""}},...}) for centrally looking up remoteHost and port settings depending on $FTP{prefix}.

maxConnectionTries

maximum number of tries for connecting in login procedure

noDirectRemoteDirChange

if no direct change into absolute paths (/some/path/to/change/into) ist possible then set this to 1, this does a separated change into setcwd(undef) and setcwd(remoteDir)

onlyArchive

only archive/remove given files on the FTP server, requires archiveDir to be set

path

additional relative FTP path (under remoteDir which is set at login), where the file(s) is/are located

port

ftp/sftp port (leave empty for default port 22 when using Net::SFTP::Foreign, or port 21 when using Net::FTP)

prefix

key for sensitive information (e.g. pwd and user) in config{sensitive} or system wide remoteHost/port in config{FTP}{prefix}{remoteHost} or config{FTP}{prefix}{port}. respects environment in $execute{env} if configured.

privKey

sftp key file location for Net::SFTP::Foreign, either directly (insecure -> visible) or via sensitive lookup

pwd

for password setting, either directly (insecure -> visible) or via sensitive lookup

queue_size

queue_size for Net::SFTP::Foreign, if > 1 this causes often connection issues

remove

ref to hash {removeFolders=>[], day=>, mon=>, year=>1} for for removing (archived) files with removeFilesOlderX, all files in removeFolders are deleted being older than day days, mon months and year years

remoteDir

remote root folder for up-/download, archive and remove: "out/Marktdaten/", path is added then for each filename (load)

remoteHost

ref to hash of IP-addresses/DNS of host(s).

SFTP

to explicitly use SFTP, if not given SFTP will be derived from existence of privKey or hostkey

simulate

for removal of files using removeFilesinFolderOlderX/removeFilesOlderX only simulate (1) or do actually (0)?

sshInstallationPath

path were ssh/plink exe to be used by Net::SFTP::Foreign is located

type

(A)scii or (B)inary, only applies to Net::FTP

user

set user directly, either directly (insecure -> visible) or via sensitive lookup

process

used to pass information within each process (data, additionalLookupData, filenames, hadErrors or commandline parameters starting with interactive) and for additional configurations not suitable for DB, File or FTP (e.g. uploadCMD* and onlyExecFor)

additionalLookupData

additional data retrieved from database with EAI::Wrap::getAdditionalDBData

archivefilenames

in case a zip archive package is retrieved, the filenames of these packages are kept here, necessary for cleanup at the end of the process

countPercent

percentage for counting File text reading and DB storing, if given (greater 0) then on each reaching of the percentage in countPercent a progress is shown (e.g. every 10% if countPercent = 10). Any value >=100 will count ALL lines...

data

loaded data: array (rows) of hash refs (columns)

filenames

names of files that were retrieved and checked to be locally available for that load, can be more than the defined file in File->filename (due to glob spec or zip archive package)

filesProcessed

hash for checking the processed files, necessary for cleanup at the end of the whole task

hadErrors

set to 1 if there were any errors in the process

interactive_

interactive options (are not checked), can be used to pass arbitrary data via command line into the script (eg a selected date for the run with interactive_date).

onlyExecFor

define loads to only be executed when $common{task}{execOnly} !~ $load->{process}{onlyExecFor}. Empty onlyExecFor loads are always executed regardless of $common{task}{execOnly}

successfullyDone

accumulates API sub names to prevent most API calls that ran successfully from being run again.

uploadCMD

upload command for use with uploadFileCMD

uploadCMDPath

path of upload command

uploadCMDLogfile

logfile where command given in uploadCMD writes output (for error handling)

task

contains parameters used on the task script level, only available for %common parameter hash.

customHistoryTimestamp

optional custom timestamp to be added to filenames moved to History/HistoryUpload/FTP archive, if not given, get_curdatetime is used (YYYYMMDD_hhmmss)

execOnly

do not execute loads where $common{task}{execOnly} !~ $load->{process}{onlyExecFor}. Empty onlyExecFor loads are always executed regardless of $common{task}{execOnly}

ignoreNoTest

ignore the notest file in the process-script folder, usually preventing all runs that are not in production

plannedUntil

latest time that planned repetition should start, this can be given either as HHMM (HourMinute) or HHMMSS (HourMinuteSecond), in case of HHMM the "Second" part is attached as 59

redoFile

flag for specifying a redo

redoTimestampPatternPart

part of the regex for checking against filename in redo with additional timestamp/redoDir pattern (e.g. "redo", numbers and _), anything after files barename (and before ".$ext" if extension is defined) is regarded as a timestamp. Example: '[\d_]', the regex is built like ($ext ? qr/$barename($redoTimestampPatternPart|$redoDir)*\.$ext/ : qr/$barename($redoTimestampPatternPart|$redoDir)*.*/)

retrySecondsErr

retry period in case of error

retrySecondsErrAfterXfails

after fail count is reached this alternate retry period in case of error is applied. If 0/undefined then job finishes after fail count

retrySecondsXfails

fail count after which the retrySecondsErr are changed to retrySecondsErrAfterXfails

retrySecondsPlanned

retry period in case of planned retry

skipHolidays

skip script execution on holidays

skipHolidaysDefault

holiday calendar to take into account for skipHolidays

skipWeekends

skip script execution on weekends

skipForFirstBusinessDate

used for "wait with execution for first business date", either this is a calendar or 1 (then calendar is skipHolidaysDefault), this cannot be used together with skipHolidays

COPYRIGHT

Copyright (c) 2024 Roland Kapl

All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.