The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

ETL::Pipeline::Output::Memory - Store records in memory

SYNOPSIS

  # Save the records into a giant list.
  use ETL::Pipeline;
  ETL::Pipeline->new( {
    input   => ['UnitTest'],
    mapping => {First => 'Header1', Second => 'Header2'},
    output  => ['Memory']
  } )->process;

  # Save the records into a hash, keyed by an identifier.
  use ETL::Pipeline;
  ETL::Pipeline->new( {
    input   => ['UnitTest'],
    mapping => {First => 'Header1', Second => 'Header2'},
    output  => ['Memory', key => 'First']
  } )->process;

DESCRIPTION

ETL::Pipeline::Output::Memory writes the record into a Perl data structure, in memory. The records can be accessed later in the same script. This output destination comes in useful when processing multiple input files.

ETL::Pipeline::Output::Memory offers two ways of storing the records - in a hash or in a list. ETL::Pipeline::Output::Memory always put records into the list. If the "key" attribute is set, then ETL::Pipeline::Output::Memory also saves records into the hash.

The hash can be used for faster look-up. Use "key" when the record contains an identifier.

METHODS & ATTRIBUTES

Arguments for "output" in ETL::Pipeline

key

Optional. If you want to store the records in a hash, then this is the field name whose value becomes the key. When set, records go into "hash".

If you don't specify a key, then records are stored in an unsorted array - "list".

Attributes

hash

Hash reference used when "key" is set. The key is the value of the field identified by "key". The value is an array reference. The array contains all of the records with that same key.

list

list is an array reference that stores records. The records are saved in same order as they are read from the input source. Each list element is a hash reference (the record).

list always has a complete set of records, whether "key" is set or not.

Methods

close

This method doesn't do anything. There's nothing to close or shut down.

number_of_ids

Count of unique identifiers. This may not be the same as the number of records. One key may have multiple records.

number_of_ids only works if the "key" attribute was set.

number_of_records

Count of records currently in storage.

open

This method doesn't do anything. There's nothing to open or setup.

records

Returns a list of all the records currently in storage. The list contains hash references - one reference for each record.

with_id

with_id returns a list of records for a given key. Pass in a value for the key and with_id returns an array reference of records.

with_id only works if the "key" attribute was set.

write

Save the current record into memory. Your script can access the records after calling "process" in ETL::Pipeline like this - $etl-output->records>. Both "records" and "with_id" can be used.

If "key" is set, write saves the record in both "hash" and "list". We're storing a reference, not a copy, so there's very little cost. And it allows methods such as "number_of_records" to work.

WARNING: This method stores a reference to the original record. If the input source re-uses the hash or embedded references, it will update all of the currently stored values too. ETL::Pipeline::Output::Memory does not make a copy.

SEE ALSO

ETL::Pipeline, ETL::Pipeline::Output

AUTHOR

Robert Wohlfarth <robert.j.wohlfarth@vumc.org>

LICENSE

Copyright 2021 (c) Vanderbilt University

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.