The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

pdf-collage - PDF manipulation with scissors and glue

VERSION

The version can be retrieved with option --version:

   $ pdf-collage --version

USAGE

   pdf-collage [--help] [--man] [--usage] [--version]

   pdf-collage [--data|--data-from|-d path [...]]
               [--list|--list-selectors|-l]
               [--output|-o output]
               [--selector|-S string]
               [--source|-s source [...]]

EXAMPLES

   # expand a plain JSON files with some data, redirect output PDF
   pdf-collage --source plain.json foo=bar baz=12 > test.pdf

   # use a "proper" bundle instead, containing a single template inside
   pdf-collage -s bundle.pdfc foo=bar baz=12 > test.pdf

   # output can be controlled with --output|-o
   pdf-collage -s bundle.pdfc -o test.pdf foo=bar baz=12

   # input stuff can be JSON too
   pdf-collage -s bundle.pdfc -o test.pdf '{"foo":"bar","baz":12}'

   # there can be more
   pdf-collage -s bundle.pdfc -o test.pdf '{"foo":"bar"}' '{"baz":12}'

   # data can be loaded from files. "Free" arguments will win though
   pdf-collage -s bundle.pdfc -o test.pdf -d data.json foo=bar

   # the output filename might be expanded as a Template::Perlish thing
   pdf-collage -s bundle.pdfc -o '[% name %].pdf' -d record.json

   # if the source contains multiple templates, it's possible to list them
   pdf-collage -s several.pdfc --list

   # in this case a selector is needed
   pdf-collage -s several.pdfc -S my-template foo=bar baz=12 > test.pdf

   # complicated things are... doable, like handling the generation of
   # multiple PDF files each with its own name generated on the fly,
   # starting from a common base and picking specific customizations
   pdf-collage -s bundle.pdfc -s other-bundle.pdfc \
      -o '[% name %]-[% id %].pdf'
      -d common-data.json foo=bar baz=galook \
      '[{"name": "you", "id": 1}, {"name": "me", "id": 2}]'

DESCRIPTION

Generate PDFs much like the mail merge function that is common to at least two big office automation suites.

It proceeds from two input types: one or more sources of templates, and one or more records of data. They are merged to generate one PDF file for each record.

In case the source (or sources) contains multiple templates inside, it's possible to list them with command-line option --list|-l, then use one of the selector strings that are printed with command-line option --selector|-S.

It's possible to use Template::Perlish templates in many places, both inside the sources of data, both elsewhere, e.g. when selecting the right source to use or naming the output file.

Collage Sources Gathering

There are two main types of collage sources: plain templates or collections. While both are supported, chances are that a collection is a better choice for anything but simple one-off needs, as it allows packing together several different artifacts and provides a more portable solution.

Sources can be provided with the --source command-line option or its alieases. They can represent JSON data, or directories, or file names:

  • if a source starts with optional spaces followed by a first non-space character that is either [ or {, then it is considered JSON data. In case of a file that is actually named like that, it's still possible to set the full or the relative path.

  • otherwise, if it's a directory... it's a directory

  • otherwise, it must be a plain file. If the first non-space character within the initial 10 bytes is either [ or { then it is considered JSON data, otherwise it is considered a TAR archive.

A source representing JSON data is considered a single template and treated as such; see "Single template". If one is present, only that one can be present and anything else will be considered an error.

A source that is either a directory or a TAR archive is considered a collection of templates. It's possible to have several collections, which will be considered collectively, with the ones appearing first in the command line taking precedence over the following when looking for stuff inside of them while rendering PDF files.

Record(s) Data Collection

Collectiong data for doing the merge can be done in multiple ways.

On one side, every command-line argument that is not part of the available options is considered a source for such record's data, in one of two forms:

  • if the first non-space character is a {, then it's considered a JSON object, parsed as a hash and merged into a common hash of values

  • if the first non-space character is a [, then it's considere a JSON array, parsed as an array and its elements added to a list of records

  • otherwise, it's considered a key/value pair, separated by the first occurrence of a separator. Again, different alternatives are supported:

    • if #= or :: consider the value part on the right as being encoded with Base64, so it's decoded accordingly

    • otherwise, if the separator is = or one single :, then the value is taken verbatim.

    In both cases the key part is considered a trail of segments to navigate through the common data and set the value. As an example, a key foo.bar.baz would set $common{foo}{bar}{baz}; the rules are the same as in Template::Perlish's traverse.

It's also possible to feed data files with the --data or its aliases. These files are always considered JSON files, with either objects (hashes) or arrays inside and handled like explained above. Files are always scanned first, so the respective values or records are handled before other data.

After the collection is complete, records are assembled. If no record was provided (i.e. no non-empty array appeared during collection), then the common data is considered a lone record and returned as such.

Otherwise, each collected record is merged with the common data and returned. This allows using the common data as providing defaults for all records, while still being able to set record-specific data or override some defaults.

Merging of hashes is performed onto a base one (the previously collected data, or the common data) based on the additional data (the new data or the specific record's data), with the following rules for handling each key/value pair:

  • If the very first character in the key is -, then the rest of the key is considered the real key and added onto the base only if it does not already contain a value. This allows for setting a late default.

  • If the very first character in the key is =, then it's stripped from the key and the key/value pair is set in the base hash. This allows supporting keys that need to begin with a literal - character, without incurring the behaviour of the previous bullet (e.g. target key -foo would be provided as =-foo).

  • Otherwise, the key/value pair is added onto the hash.

There is no attempt at doing a deep merge of hash values, so only the top-level will be handled.

Templates Resolution

If a simple/single template is provided as JSON data, there's no resolution to be done and it's used directly.

Otherwise, if the source is a collection (or several collections), it makes sense to select one of the included templates.

First of all, it's possible to use option --list|- to get a list of all available templates inside; the strings printed in standard output represent the selectors that can be later used to point out the specific template that is needed.

If there is only one selector, it's not necessary to pass it when invoking the program, as it will be used automatically. Otherwise, it is necessary to use command-line option --selector|-S to pass the selector string.

Writing Templates

At the basic level, a template is a list of commands inside a properly-formatted JSON file.

Many times, though, these commands will refer to specific artifacts, like e.g. one or more input PDF files from where pages should be taken; this is where a templates collection source is better, as it allows to pack the JSON file with the commands together with all artifacts (including fonts, if needed) inside a directory or a TAR archive (for best portability).

Single template

The JSON template is a string (/file) containing the instructions for rendering a PDF file. It can have two forms: an object or an array.

In the former case, the object MUST contain a key commands whose corresponding value is an array with the list of commands; in the latter, the array is directly the container for the list of commands.

The following commands are supported:

add-image
   {
      "op": "add-image",
      "page": 1,
      "path": "/path/to/image.png",
      "x": 10,
      "y": 30,
      "width": 10,
      "height": 10
   }

Add an image. See PDF::Build for the supported formats.

add-page
   { "op": "add-page" }

Add an empty page at the end.

   { "op": "add-page", "page": 1 }

Add an empty page as page number 1.

   {
      "op": "add-page",
      "page": 2,
      "from-path": "/some/file.pdf",
      "from-page": 3
   }

Get page 3 from file /some/file.pdf and add it as page 2 in the PDF that is built.

Key from-path can also be abbreviated as from.

add-text
   {
      "op": "add-text",
      "page": 1,
      "font": "DejaVuSans.ttf",
      "font-size": 12,
      "text": "whatever",
      "x": 10,
      "y": 20
   }

Place a text label on the PDF.

The font key can be replaced with font-family.

There are three ways of defining the text:

text

This text is taken verbatim and has precedence over other alternatives;

text-template

This text is expanded using Template::Perlish. It takes precedence over "text-variable".

text-variable

This is meant to be a variable that is expanded using Template::Perlish on the data provided.

log
   {
      "op": "log",
      "level": "info",
      "message": "whatever!"
   }

Print a log message. If Log::Any is available, it will use it; otherwise, warn is used.

set-defaults
   {
      "op": "set-default",
      "font": "DejaVuSans.ttf",
      "font-size": 12,
      "level": "info"
   }

Set some defaults that will be used in following commands. This allows e.g. to set the same font once and for all for all "add-text" commands, or the font size.

Templates collection

A templates collection is a bundle that allows packing together multiple templates, as well as artifacts that can be referred from these templates.

In its basic form, it is a directory with the structure that is detailed below. This directory can also be packed as a TAR archive, that can be used as a collection too for maximum portability.

JSON templates MUST be files with extension .json put inside a sub-directory named definitions. Other artifacts can be placed in any place.

It's possible to refer to the artifacts bundled in the collection using function as_file() that is injected in the Template::Perlish namespace and can thus be used in Template::Perlish templates. As an example, if the bundle includes a font file in location assets/fonts/shiny.ttf, it's possible to use it in a add-text command like this:

   {
      "op": "add-text",
      "page": 1,
      "font": "[%= as_file('assets/fonts/shiny.ttf') %]",
      "'font-size": 12,
      "text": "whatever",
      "x": 10,
      "y": 20
   }

Similarly, for taking a page from a bundled PDF file in location assets/pdf/models.pdf inside the directory:

   {
      "op": "add-page",
      "page": 2,
      "from-path": "[%= as_file('assets/pdf/models.pdf') %]",
      "from-page": 3
   }

OPTIONS

--data|--data-from|-d path

load some data from the file at the specific path, assuming it's JSON.

JSON objects (hashes) contribute to a common set of data.

JSON arrays (of hashes) add records.

--help

print out some help and exit.

--list|--list-selectors|-l

print out the list of available selectors from provided sources.

--man

show the manual page for pdf-collage.

--output|-o output-spec

the output filename, defaulting to - which means standard output.

It is treated as a template string expanded with each record's data.

--selector|-S string

a selector string for templates with multiple definitions inside.

It is treated as a template string expanded with each record's data.

--source|-s specification

a suitable input for taking instructions for building the PDF. It can be either a file holding JSON data, in which case it is treated as a simple template; otherwise it's considered a collection of templates bundled with artifacts, which usually implies that a selector will be needed (unless the bundle contains one single definition only).

--usage

show usage instructions.

--version

show version.

BUGS AND LIMITATIONS

Please report any bugs or feature requests through the repository at https://codeberg.org/polettix/PDF-Collage.

AUTHOR

Flavio Poletti

LICENSE AND COPYRIGHT

Copyright 2023 by Flavio Poletti (flavio@polettix.it).

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

or look for file LICENSE in this project's root directory.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.