The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Data::Tubes::Plugin::Plumbing

DESCRIPTION

This module contains tubes factories for handling general plumbing requirements, e.g. put some other tubes in a sequence.

FUNCTIONS

alternatives

   $tube = alternatives(@tubes); # OR
   $tube = alternatives(@tubes, \%args);

consider a series of tubes as different alternatives, to be triggered in order until one of them returns something.

In simple terms, the first item in @tubes is called with the input record. If it returns nothing, the second item in @tubes is tried, and so on. The first one to return something (i.e. a record, or multiple ones) wins and its result is returned. Think of it as some OR function in tubeland.

If no tube returns anything, the tube itself returns nothing.

You can set the following options with the optional %args:

name

set a name for the dispatcher, might be useful while debugging if you plan to use more than one dispatcher.

cache

   $ctube = cache($tube, %args); # OR
   $ctube = cache(%args); # OR
   $ctube = cache(\%args);

create a cache layer around another tube.

The wrapped tube can be provided either as the first unnamed parameter or via argument tube. You can set it using any of the alternatives supported by "tubify" in Data::Tubes::Plugin::Util.

The main algorithm for caching is the following:

  • a key is derived from the record. Option key can be used to this regard, but the whole record is considered the key otherwise. In this last case, it is forbidden to set option output as the input record is supposed to not be a hash reference;

  • the cache is queried with the key and a value is retrieved;

  • if the cache did not return anything, the wrapped tube is invoked and its contents are cached. If the tube returns an iterator, it is exhausted and transformed into an array reference of records. Whatever is cached is also set as value for the following processing;

  • depending on the value, the output record(s) is(are) generated and returned.

The output of this tube can be everything except an iterator. The input record might be overridden depending on output and merger, see below.

Any time an item is set in the cache, a clearer function might be called if set in option cleaner.

Accepted arguments are:

cache

something that can be used as a cache, namely:

  • a hash reference, that will be used via "repository" in Data::Tubes::Util::Cache;

  • anything supporting the interface of Data::Tubes::Util::Cache, which is also valid for any cache valid for CHI;

  • an array reference that will be transformed in a cache object. The first element of the array can be either a sub reference or a string; if a string, it is considered the name of a module (according to the rules set for "resolve_module" in Data::Tubes::Util) and its new method is considered. The rest of the array is passed as arguments to the sub ref or the new method.

If you want to use CHI, you can do like this:

   cache => ['^CHI', driver => 'File', root_dir => '/path/to/root']

Note that the exclamation point is necessary in this case to avoid the automatic prefixing performed by "resolve_module" in Data::Tubes::Util.

If this parameter is missing, an empty hash is assumed and Data::Tubes::Util::Cache is used.

cleaner

an optional cleaning function for avoiding cache explosion. If you set it to a string, it is supposed to be a method supported by whatever comes from cache. Otherwise, you can set it to a sub reference.

For example, if you use Data::Tubes::Util::Cache and set max_items, you might want to set cleaner to purge so that the "purge" in Data::Tubes::Util::Cache will be called (otherwise, max_items will be ignored as a matter of fact).

get_options

an optional array reference of values passed when invoking method get on the cache. Ignored by "get" in Data::Tubes::Util::Cache, but not by CHI. Defaults to an empty array reference;

key

mechanism for deriving a key from the input record, to use as index in the cache. It can be:

  • a sub reference, that is run with the input record as the only parameter, and MUST return the key to use;

  • a single string or an array reference containing a sequence of strings, passed to "traverse" in Data::Tubes::Util for arriving to something meaningful;

merger

optional subroutine reference for generating an output record from an input record and a value retrieved from the cache. When defined, the sub is run with three positional parameers:

  • the input record;

  • the name of the output field (factory argument output);

  • the value to associate to output.

The default operation when returning a single record is equivalent to the following:

   {%$input_record, $output => $value}
name

name of the tube, useful when debugging. Defaults to 'cache';

output

name of the output field in the returned record. If it is not defined, the whole record is considered the output.

set_options

an optional array reference of values passed when invoking method set on the cache. Ignored by "set" in Data::Tubes::Util::Cache, but not by CHI. Defaults to an empty array reference;

tube

the wrapped tube, i.e. the tube whose output we want to cache for later reuse. You can use whatever "tubify" in Data::Tubes::Plugin::Util accepts, which means a tube or whatever can be turned into one.

dispatch

   $tube = dispatch(%args); # OR
   $tube = dispatch(\%args);

this function decides a sub-tube to use for dispatching a specific record. The selection of the sub-tube is performed through two different mechanisms:

  • first, a selector function is applied to the input record, optionally defaulting to a configurable value. This selector is a string that MUST uniquely identify the output tube where the record should be dispatched;

  • then, if the tube associated to the selector is already known, it will be used for the dispatching. Otherwise, a factory will be used to get a new handler tube for the specific selector, if possible.

The arguments passed through %args allow you to define the selector and the factory in a flexible way. Available options are:

default

this allows defining the default selection key when none is available (i.e. it would be the undefined value). If set to an undef value, lack of a selector will throw an exception. Defaults to undef;

factory

set a sub reference to generate new tubes when needed. The factory function will be fed with the specific selection key as the first argument, and the record as the second argument, and it is supposed to return anything that can be converted to a valid tube via "tubify" in Data::Tubes::Plugin::Util (although it might throw an exception by itself, of course);

handlers

this is a quick way to set a simple factory that just returns elements from a hash reference (that is passed as value). If this is used, every key that is not present in the hash will throw an exception;

key

this is a quick way to specify a selector function. It points to either a string/integer, or an array containing a sequence of strings/integers; these items will be used to access the provided $record in a "visit" that uses an item at each step. Example:

   $record = {aref => [1, 2, {foo => 'bar'}]};
   @key = qw< aref 2 foo >; # this will select 'bar' above

If the option selector is passed, this field will be ignored;

name

set a name for the dispatcher, might be useful while debugging if you plan to use more than one dispatcher;

selector

set to a subroutine reference that will be passed the input record and SHOULD provide a string back, that will uniquely identify a tube.

One between selector or key MUST be provided. At least one between factory and handlers MUST be provided (but you can provide both, in which case handlers acts as a starting point).

fallback

   $tube = fallback(@tubes); # OR
   $tube = fallback(@tubes, \%args);

consider a series of tubes as different alternatives, to be triggered in order until one of them does not throw an exception.

In simple terms, the first item in @tubes is called with the input record. If it throws an exception, the second item in @tubes is tried, and so on. The first one to NOT throw na exception wins and its result is returned. Think of it as some OR function in tubeland, applied to exception throwing. This function is very similar to "alternatives", although there is a different exception handling here.

Returns nothing if all tubes throw an exception, otherwise it returns the return value of the first tube that does not throw an exception, and ignores the rest of the tubes.

The exception handling is performed via Try::Catch.

You can set the following options with the optional %args:

catch

an optional sub reference to be called when an exception is catched. The sub is called like this:

   $catcher->($exception, $record);

The return value of this function is ignored.

name

set a name for the dispatcher, might be useful while debugging if you plan to use more than one dispatcher.

logger

   my $tube = logger(%args); # OR
   my $tube = logger(\%args);

this function generates a tube that is useful for logging things. You can pass the following arguments:

loglevel

the level where the logging should happen. See Log::Log4perl::Tiny for the available ones. You can pass either the numeric value of the log level (as exported via :levels by Log::Log4perl::Tiny) or the log level name (uppercase, e.g. INFO or DEBUG);

name

the name assigned to the logger tube, might be useful while debugging;

target

a facility to isolate part of the target record and/or produce a message suitable for logging.

If not provided or undefined (which is the default), the whole input record will be passed to the logger function. This is probably what you don't want in the vast majority of cases, as you will only see a strange address printed out. Works fine if the input record is something printable, anyway.

The most flexible thing that you can pass is a sub reference. This will receive the input record, and SHOULD return back a string that will be printed in the log stream.

You can also provide either a string or a sequence of strings in an array reference. In this case, the record will be visited using these keys, much in the same way as described for "dispatch" above. Again, you should be pretty sure that the leaf value found after this traversal is something meaningful for printing.

The generated tube always returns back the input record, unchanged.

pipeline

   $tube = pipeline(@tubes); # OR
   $tube = pipeline(@tubes, \%args);

this is a thin wrapper around "sequence", added to avoid changing its signature. It is the same as calling:

   $tube = sequence(tubes => \@tubes); # OR
   $tube = sequence(%args, tubes => \@tubes);

(depending on what you provide as input), only a bit more natural.

sequence

   my $tube = sequence(\@tubes, %args); # OR
   my $tube = sequence(%args); # OR
   my $tube = sequence(\%args);

this function takes a sequence of tubes (i.e. functions that are compliant with the tube definition) and returns a tube that provides serialization of the operations, in the order as the passed list.

The returned tube is such that it will always return an iterator back (in particular, it will return two elements, the first is the string iterator and the second is an iterator sub reference).

Arguments can be passed through a single reference to a hash, or as a sequence of key/value pairs. The following options are supported:

gate

a sub ref that is called over each intermediate record to establish if it can continue down the sequence or it should be returned immediately, depending on the truth of the returned value. The sub reference is passed the record and might change it. Defaults to undef, which means that no gating function is invoked;

name

set a name for the sequence, which might come handy when debugging. Defaults to sequence;

logger

can be optionally set to a function that will be called for each input record, being passed the record itself and a reference to the hash of arguments. Use this if you want to do some logging, ignore otherwise;

tubes

an array reference containing the list of tubes part of the sequence. These can be either direct tubes (i.e. references to subroutines) or definitions suitable for calling "tube" in Data::Tubes. This parameter can also be passed as the first unnamed argument in the call to the function.

The sequence makes no assumption as to the input record, although the first element in the provided list might do.

Note that the last tube in the sequence might actually return an output record with an undef or otherwise false value (Perl-wise). To cope with this, when called in list context, the iterator is guaranteed to either return one single output record, or the empty list when the iterator is exhausted.

The suggested idiom for taking items from the iterator is then the following:

   my $it1 = $sequence1->($input_record)->{iterator};
   while (my ($output_record) = $it1->()) {
      # work with $output_record here, it's your output record!
   }

   # if you're waiting for a single output record, use if
   my $it2 = $sequence2->($input_record)->{iterator};
   if (my ($output_record) = $it2->()) {
      # work with $output_record here, it's your output record!
   }

BUGS AND LIMITATIONS

Report bugs either through RT or GitHub (patches welcome).

AUTHOR

Flavio Poletti <polettix@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2016 by Flavio Poletti <polettix@cpan.org>

This module is free software. You can redistribute it and/or modify it under the terms of the Artistic License 2.0.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.