The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Data::Mirror - a simple way to efficiently retrieve data from the World Wide Web.

VERSION

version 0.03

SYNOPSIS

    use Data::Mirror qw(:all);

    # set the global time-to-live of all cached resources
    $Data::Mirror::TTL = 30;

    # get some data
    $file   = mirror_file($url);
    $string = mirror_str($url);
    $fh     = mirror_fh($url);
    $json   = mirror_json($url);
    $xml    = mirror_xml($url);
    $yaml   = mirror_yaml($url);
    $rows   = mirror_csv($url);

DESCRIPTION

Data::Mirror tries to take away as much pain as possible when it comes to retrieving and using remote data sources such as JSON objects, YAML documents, XML instances and CSV files.

Many Perl programs need to retrieve, store, and then parse remote data resources. This can result in a lot of repetitive code, to generate a local filename, check to see if it already exists and is sufficiently fresh, retrieve a copy of the remote resource if needed, and then parse it. If a program uses data sources of many different types (say JSON, XML and CSV) then it often does the same thing over and over again, just using different modules for parsing.

Data::Mirror does all that for you, so you can focus on using the data.

USAGE

The general form of this module's API is:

    $value = Data::Mirror::mirror_TYPE($url);

where TYPE corresponds to the expected data type of the resource at $url (which can be a string or a URI).

The return value will be undef if there's an error. The module will carp() so you can catch any errors.

Note: it's possible that the remote resource will actually be someting that evaluates to undef (for example, a JSON document that is exactly "null", or a YAML document that is exactly "~"), or if there is an error parsing the resource once retrieved. Consider wrapping the method call in eval if you need to distinguish between these scenarios.

By default, if the locally cached version of the resource is younger than $Data::Mirror::TTL_SECONDS old, Data::Mirror will just use it and won't try to refresh it, but you can override that per-request by passing the $ttl argument:

    $value = Data::Mirror::mirror_TYPE($url, $ttl);

EXPORTS

To import all the functions listed below, include :all in the tags imported by use:

    use Data::Mirror qw(:all);

You can also import specific functions separately:

    use Data::Mirror qw(mirror_json mirror_csv);

PACKAGE VARIABLES

$TTL_SECONDS

This is the global "time to live" of local copies of files, which is used if the $ttl argument is not passed to a mirror function. By default it's 300 seconds.

If Data::Mirror receives a 304 response from the server, then it will update the mtime of the local file so that another refresh will not occur until a further $TTL_SECONDS seconds has elapsed. The mtime will either be the current timestamp, or the value of the Expires header, whichever is later.

$UA

This is an LWP::UserAgent object used to retrieve remote resources. You may wish to use this variable to configure various aspects of its behaviour, such as credentials, user agent string, TLS options, etc.

$JSON

This is a JSON::XS object used for JSON decoding. You may wish to use this variable to change how it processes JSON data.

$CSV

This is a Text::CSV_XS object used for CSV parsing. You may wish to use this variable to change how it processes CSV data.

FUNCTIONS

mirror_file()

This method returns a string containg a name of a local file containing the resource. All the other functions listed in this section use mirror_file() under the hood.

Data::Mirror will write local copies of files to the appropriate temporary directory (determined using File::Spec->tmpdir) and tries to reduce the risk of collision by hashing the URL and the current username. This means that different programs, run by the same user, that use Data::Mirror to retrieve the same URL, will effectively share a cache for that URL, but other users on the system will not. File permissions are set to 0600 so other users cannot read the files.

mirror_str($url)

This method returns a UTF-8 encoded string containing the resource. If it's possible that the resource might be large enough to use up a lot of memory, consider using mirror_file() or mirror_fh() instead.

mirror_fh()

This method returns an IO::File handle containing the resource.

mirror_xml()

This method returns an XML::LibXML::Document handle containing the resource.

mirror_json()

This method returns a JSON data structure containing the resource. This could be undef, a simple string, or an arrayref or hashref.

mirror_yaml()

This method returns a YAML data structure containing the resource. This could be undef, a simple string, or an arrayref or hashref.

mirror_csv()

This method returns a reference to an array of arrayrefs containing the CSV rows in the resource.

REPORTING BUGS, CONTRIBUTING ENHANCEMENTS

This module is developed on GitHub at https://github.com/gbxyz/perl-data-mirror.

AUTHOR

Gavin Brown <gavin.brown@fastmail.uk>

COPYRIGHT AND LICENSE

This software is copyright (c) 2023 by Gavin Brown.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.