The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

ETL::Yertl::Transform - Transform a stream of documents

VERSION

version 0.044

SYNOPSIS

    ### Simple transform callback
    use ETL::Yertl;
    use ETL::Yertl::Transform;
    my $xform = ETL::Yertl::Transform->new(
        transform_doc => sub {
            # Document is in $_
        },
        source => ETL::Yertl::FormatStream->new_for_stdin,
        destination => ETL::Yertl::FormatStream->new_for_stdout,
    );

    ### Transform class
    package Local::Transform::Dump;
    use ETL::Yertl;
    use Data::Dumper;
    use base 'ETL::Yertl::Transform';
    sub transform_doc {
        my ( $self, $doc ) = @_;
        say Dumper $doc;
        return $doc;
    }

    package main;
    use ETL::Yertl;
    my $xform = Local::Transform::Dump->new(
        source => ETL::Yertl::FormatStream->new_for_stdin,
        destination => ETL::Yertl::FormatStream->new_for_stdout,
    );

DESCRIPTION

This class holds a transformation routine in a Yertl stream. Transforms read documents from ETL::Yertl::FormatStream objects and optionally write them to another ETL::Yertl::FormatStream object. Transforms can chain to other transforms, creating a pipeline of transformations.

Transformations can be simple subroutines or full classes (inheriting from this class).

Transform Object

Create ad-hoc transform objects by passing in a transform_doc callback. The callback receives two arguments: The transform object, and the document to transform. The callback should return the transformed document (whether or not it is the same document modified in-place).

Transform Class

Create transform classes by inheriting from ETL::Yertl::Transform. Subclasses can override the transform_doc method to transform documents. This method receives the same arguments, returns the same values, sets $_, and behaves exactly like the transform_doc callback.

Overloaded Operators

Transforms can be chained together using the pipe (|) operator. The result of the expression is the transform on the right side, for continued chaining.

    my $xform1 = ETL::Yertl::Transform->new(
        transform_doc => sub { ... },
    );
    my $xform2 = ETL::Yertl::Transform->new(
        transform_doc => sub { ... },
    );
    my $xform3 = $xform1 | $xform2 | ETL::Yertl::Transform->new(
        transform_doc => sub { ... },
    );

Transforms can receive sources using the << operator with a ETL::Yertl::FormatStream object. The result of the expression is the transform object, for continued chaining.

    my $input = ETL::Yertl::FormatStream->new_for_stdin;
    my $xform = ETL::Yertl::Transform->new(
        transform_doc => sub { ... },
    ) << $input;

Transforms can receive destinations using the >> operator with a ETL::Yertl::FormatStream object. The result of the expression is the transform object, for continued chaining.

    my $output = ETL::Yertl::FormatStream->new_for_stdout;
    my $xform = ETL::Yertl::Transform->new(
        transform_doc => sub { ... },
    ) >> $output;

METHODS

new

    my $xform = ETL::Yertl::Transform->new( %args );

Create a new transform object. %args is a hash with the following keys:

source

The source for documents. Can be a ETL::Yertl::FormatStream or a ETL::Yertl::Transform object. You do not need to specify this right away, but it is required for the transform to do useful work.

destination

(optional) A ETL::Yertl::FormatStream object to write the documents to. This can be an intermediate destination or the ultimate destination. The last transform in a stream should have a destination.

transform_doc

A subref to transform the documents read from the source. The subref will receive two arguments: The transform object and the document to transform. It should return the transformed document. The document to transform is also set as $_ for simpler transforms.

configure

    $xform->configure( %args );

Configure this object. Takes the same arguments as the constructor, "new". This method allows updating any of the transform attributes later, so that transforms can be given new sources/destinations.

write

    $xform->write( $doc );

Write a document explicitly. This can be used by the transform_doc callback to write documents without needing to return them from the callback.

run

    $xform->run;

Run the transform, returning when all data is read from the source, and all data written to the destination (if any).

SEE ALSO

ETL::Yertl, ETL::Yertl::FormatStream

AUTHOR

Doug Bell <preaction@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2018 by Doug Bell.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.