The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

MarpaX::Languages::Perl::PackUnpack - Parse the templates used in pack() and unpack()

Synopsis

        #!/usr/bin/env perl

        use strict;
        use warnings;

        use MarpaX::Languages::Perl::PackUnpack ':constants';

        # -----------

        my($parser) = MarpaX::Languages::Perl::PackUnpack -> new(options => print_warnings);
        my(@text)   =
        (
                qq|n/a* # Newline
        w/a2|,
                q|a3/A A*|,
                q|i9pl|,
        );

        my($result);

        for my $text (@text)
        {
                print "Parsing: $text. \n";

                $result = $parser -> parse($text);

                print join("\n", @{$parser -> tree2string}), "\n";
                print "Parse result: $result (0 is success)\n";
                print 'Template: ', $parser -> template_report, ". \n";
                print '-' x 50, "\n";
        }

        print "\n";

        $parser -> size_report;

See scripts/synopsis.pl.

This is the output of synopsis.pl:

        Parsing: n/a* # Newline
        w/a2.
        root. Attributes: {}
           |--- token. Attributes: {lexeme => "bang_only_set", text => "n"}
           |   |--- slash_literal. Attributes: {lexeme => "slash_literal", text => "/"}
           |   |--- token. Attributes: {lexeme => "basic_set", text => "a"}
           |   |   |--- star. Attributes: {lexeme => "star", text => "*"}
           |   |--- token. Attributes: {lexeme => "basic_set", text => "w"}
           |   |   |--- slash_literal. Attributes: {lexeme => "slash_literal", text => "/"}
           |   |--- token. Attributes: {lexeme => "basic_set", text => "a"}
               |   |--- number. Attributes: {lexeme => "number", text => "2"}
        Parse result: 0 (0 is success)
        Template: n/a* w/a2.
        --------------------------------------------------
        Parsing: a3/A A*.
        root. Attributes: {}
           |--- token. Attributes: {lexeme => "basic_set", text => "a"}
           |   |--- number. Attributes: {lexeme => "number", text => "3"}
           |   |--- slash_literal. Attributes: {lexeme => "slash_literal", text => "/"}
           |   |--- token. Attributes: {lexeme => "basic_set", text => "A"}
           |   |--- token. Attributes: {lexeme => "basic_set", text => "A"}
               |   |--- star. Attributes: {lexeme => "star", text => "*"}
        Parse result: 0 (0 is success)
        Template: a3/A A*.
        --------------------------------------------------
        Parsing: i9pl.
        root. Attributes: {}
           |--- token. Attributes: {lexeme => "bang_and_endian_set", text => "i"}
           |   |--- number. Attributes: {lexeme => "number", text => "9"}
           |   |--- token. Attributes: {lexeme => "endian_only_set", text => "p"}
               |--- token. Attributes: {lexeme => "bang_and_endian_set", text => "l"}
        Parse result: 0 (0 is success)
        Template: i9 p l.
        --------------------------------------------------

        Size report:
        Byte order: 12345678. Little endian: 1. Big endian: 0.
        Some template codes and their size requirements:
        Signed  Unsigned  Name        Byte length in Perl
        s!      S!        short       2  $Config{shortsize}
        i!      I!        int         4  $Config{intsize}
        l!      L!        long        8  $Config{longsize}
        q!      Q!        longlong    8  $Config{longlongsize}

Description

MarpaX::Languages::Perl::PackUnpack provides a Marpa::R2-based parser for parsing the templates used in pack() and unpack().

The parsed details are stored in a Tree, and can be accessed via the methods "tree2string($options, [$some_tree])" and "template_report". The tree itself can be accessed with the method "tree()".

Policy: Event names are always the same as the name of the corresponding lexeme. So any reference to 'event name' is the same as to 'lexeme name', and visa versa. This can be seen in the grammar where every lexeme which is not discarded is associated with an event of the same name. This matter is discussed in detail under the question "How is the parsed data held in RAM?" in FAQ.

Distributions

This module is available as a Unix-style distro (*.tgz).

See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing distros.

Installation

Install MarpaX::Languages::Perl::PackUnpack as you would any Perl module:

Run:

        cpanm MarpaX::Languages::Perl::PackUnpack

or run:

        sudo cpan MarpaX::Languages::Perl::PackUnpack

or unpack the distro, and then either:

        perl Build.PL
        ./Build
        ./Build test
        sudo ./Build install

or:

        perl Makefile.PL
        make (or dmake or nmake)
        make test
        make install

Constructor and Initialization

new() is called as my($parser) = MarpaX::Languages::Perl::PackUnpack -> new(k1 => v1, k2 => v2, ...).

It returns a new object of type MarpaX::Languages::Perl::PackUnpack.

Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. "template([$string])"]):

o next_few_limit => $integer

This controls how many characters are printed when displaying 'the next few chars'.

It only affects debug output.

Default: 20.

o options => $bit_string

This allows you to turn on various options.

Default: 0 (nothing is fatal).

See the "FAQ" for details.

o template => $string

Specify the string to be parsed.

Default: ''.

Methods

bnf()

Returns a string containing the grammar.

error_message()

Returns the last error or warning message set.

Error messages always start with 'Error: '. Messages never end with "\n".

Parsing error strings is not a good idea, ever though this module's format for them is fixed.

See "error_number()".

error_number()

Returns the last error or warning number set.

Warnings have values < 0, and errors have values > 0.

If the value is > 0, the message has the prefix 'Error: ', and if the value is < 0, it has the prefix 'Warning: '. If this is not the case, it's a reportable bug.

Possible values for error_number() and error_message():

o 0 => ""

This is the default value.

o 1/-1 => "Ambiguous parse. Status: $status. Terminals expected: a, b, ..."

This message is only produced when the parse is ambiguous.

If "error_number()" returns 1, it's an error, and if it returns -1 it's a warning.

You can set the option ambiguity_is_fatal to make it fatal.

o 2 => "Unexpected event name 'xyz'"

Marpa has trigged an event and it's name is not in the hash of event names derived from the BNF.

This message can never be just a warning message.

o 3 => "The code does not handle these events simultaneously: a, b, ..."

The code is written to handle single events at a time, or in rare cases, 2 events at the same time. But here, multiple events have been triggered and the code cannot handle the given combination.

This message can never be just a warning message.

See "error_message()".

format_node($options, $node)

Returns a string consisting of the node's name and, optionally, it's attributes.

Possible keys in the $options hashref:

o no_attributes => $Boolean

If 1, the node's attributes are not included in the string returned.

Default: 0 (include attributes).

Calls "hashref2string($hashref)".

Called by "node2string($options, $is_last_node, $node, $vert_dashes)".

You would not normally call this method.

If you don't wish to supply options, use format_node({}, $node).

hashref2string($hashref)

Returns the given hashref as a string.

Called by "format_node($options, $node)".

known_events()

Returns a hashref where the keys are event names and the values are 1.

new()

See "Constructor and Initialization" for details on the parameters accepted by "new()".

next_few_chars($string, $offset)

Returns a substring of $s, starting at $offset, for use in debug messages.

See next_few_limit([$integer]).

next_few_limit([$integer])

Here, the [] indicate an optional parameter.

Get or set the number of characters called 'the next few chars', which are printed during debugging.

'next_few_limit' is a parameter to "new()". See "Constructor and Initialization" for details.

node2string($options, $is_last_node, $node, $vert_dashes)

Returns a string of the node's name and attributes, with a leading indent, suitable for printing.

Possible keys in the $options hashref:

o no_attributes => $Boolean

If 1, the node's attributes are not included in the string returned.

Default: 0 (include attributes).

Ignore the parameter $vert_dashes. The code uses it as temporary storage.

Calls "format_node($options, $node)".

Called by "tree2string($options, [$some_tree])".

options([$bit_string])

Here, the [] indicate an optional parameter.

Get or set the option flags.

For typical usage, see scripts/synopsis.pl.

See the "FAQ" for details.

'options' is a parameter to "new()". See "Constructor and Initialization" for details.

parse([$string])

Here, the [] indicate an optional parameter.

This is the only method the user needs to call. All data can be supplied when calling "new()".

You can of course call other methods (e.g. "template([$string])" ) after calling "new()" but before calling parse().

Note: If a string is passed to parse(), it takes precedence over any string passed to new(template => $string), and over any string passed to "template([$string])". Further, the string passed to parse() is passed to "template([$string])", meaning any subsequent call to template() returns the string passed to parse().

See scripts/samples.pl.

Returns 0 for success and 1 for failure.

If the value is 1, you should call "error_number()" to find out what happened.

size_report()

Prints some statistics for the sizes of various integers (short, int, long, etc).

See scripts/synopsis.pl.

template([$string])

Here, the [] indicate an optional parameter.

Get or set the string to be parsed.

'template' is a parameter to "new()". See "Constructor and Initialization" for details.

template_report

Get the string output from the parse. The code generates this string by walking the nodes of the Tree returned by a call to $self -> tree().

Apart from perhaps spacing, it will be identical to the string passed in to be parsed.

See t/test.t.

tree()

Returns an object of type Tree, which holds the parsed data.

Obviously, it only makes sense to call tree() after calling "parse([$string])".

See scripts/traverse.pl for sample code which processes this tree's nodes.

If you wish to save the tree before calling "parse([$string])" again, call:

        my($tree) = $parser -> tree -> clone();

Later you can then do this to process $tree instead of $parser's tree:

        print join("\n", @{$parser -> tree2string({}, $tree)}), "\n";

tree2string($options, [$some_tree])

Here, the [] represent an optional parameter.

If $some_tree is not supplied, uses the calling object's tree ($self -> tree).

Returns an arrayref of lines, suitable for printing. These lines do not end in "\n".

Draws a nice ASCII-art representation of the tree structure.

The tree looks like:

        Root. Attributes: {# => "0"}
           |--- I. Attributes: {# => "1"}
           |   |--- J. Attributes: {# => "3"}
           |   |   |--- K. Attributes: {# => "3"}
           |   |--- J. Attributes: {# => "4"}
           |       |--- L. Attributes: {# => "5"}
           |           |--- M. Attributes: {# => "5"}
           |               |--- N. Attributes: {# => "5"}
           |                   |--- O. Attributes: {# => "5"}
           |--- H. Attributes: {# => "2"}
           |   |--- J. Attributes: {# => "3"}
           |   |   |--- K. Attributes: {# => "3"}
           |   |--- J. Attributes: {# => "4"}
           |       |--- L. Attributes: {# => "5"}
           |           |--- M. Attributes: {# => "5"}
           |               |--- N. Attributes: {# => "5"}
           |                   |--- O. Attributes: {# => "5"}

Or, without attributes:

        Root
           |--- I
           |   |--- J
           |   |   |--- K
           |   |--- J
           |       |--- L
           |           |--- M
           |               |--- N
           |                   |--- O
           |--- H
           |   |--- J
           |   |   |--- K
           |   |--- J
           |       |--- L
           |           |--- M
           |               |--- N
           |                   |--- O

See scripts/samples.pl.

Example usage:

        print map("$_\n", @{$tree -> tree2string});

Can be called with $some_tree set to any $node, and will print the tree assuming $node is the root.

If you don't wish to supply options, use tree2string({}, $node).

Possible keys in the $options hashref (which defaults to {}):

o no_attributes => $Boolean

If 1, the node's attributes are not included in the string returned.

Default: 0 (include attributes).

Calls "node2string($options, $is_last_node, $node, $vert_dashes)".

FAQ

Where are the error messages and numbers described?

See "error_message()" and "error_number()".

What are the possible values for the 'options' parameter to new()?

Firstly, to make these constants available, you must say:

        use MarpaX::Languages::Perl::PackUnpack ':constants';

Secondly, more detail on errors and warnings can be found at "error_number()".

Thirdly, for usage of these option flags, see scripts/*.pl.

Now the flags themselves:

o nothing_is_fatal

This is the default.

It's value is 0.

o debug

Print extra stuff if this flag is set.

It's value is 1.

o print_warnings

Print various warnings if this flag is set:

o The ambiguity status and terminals expected, if the parse is ambiguous
o See "error_number()" for other warnings which might be printed

Ambiguity is not, in and of itself, an error. But see the ambiguity_is_fatal option, below.

It's tempting to call this option warnings, but Perl already has use warnings, so I didn't.

It's value is 2.

o ambiguity_is_fatal

This makes "error_number()" return 1 rather than -1.

It's value is 4.

How do I print the tree built by the parser?

See "Synopsis".

How do I make use of the tree built by the parser?

See scripts/traverse.pl.

How is the parsed data held in RAM?

The parsed output is held in a tree managed by Tree.

The tree always has a root node, which has nothing to do with the input data. So, even an empty imput string will produce a tree with 1 node. This root has an empty hashref associated with it.

Nodes have a name (accessed with the value() method) and a hashref of attributes (accessed with the meta() method).

The name indicates the type of node. Names are one of these literals:

o 'token'

If the node's name is 'token', then the node represents one of the template characters listed in the first table in the docs for pack(). Note: both '(' and ')' are called 'token'.

o $lexeme_name

This means all other lexemes identified in the parse have as their node name the name of a lexeme as given in the grammar returned by the "bnf()" method. The actual lexeme in question is the one used to identify a substring of the input template.

See the following hashref for details.

For each node, the attributes hashref contains 2 keys:

o lexeme => $lexeme_name

This is always $lexeme_name (as just above), even in those cases where the node's name is 'token'.

o text => $text

This is a substring from the template being parsed. The exact contents and length of this string depend on which lexeme in the input template was recognised, which is identified by the value of the 'lexeme' key.

See scripts/traverse.pl, which prints a few trees differently than what happens when "tree2string($options, [$some_tree])" is called.

What is the homepage of Marpa?

http://savage.net.au/Marpa.html.

That page has a long list of links.

How do I run author tests?

This runs both standard and author tests:

        shell> perl Build.PL; ./Build; ./Build authortest

See Also

The docs for pack().

The pack()/unpack() tutorial.

The docs for unpack().

Tree and Tree::Persist.

Machine-Readable Change Log

The file Changes was converted into Changelog.ini by Module::Metadata::Changes.

Version Numbers

Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.

Repository

https://github.com/ronsavage/MarpaX-Languages-Perl-Pack

Support

Email the author, or log a bug on RT:

https://rt.cpan.org/Public/Dist/Display.html?Name=MarpaX::Languages::Perl::PackUnpack.

Author

MarpaX::Languages::Perl::PackUnpack was written by Ron Savage <ron@savage.net.au> in 2015.

Marpa's homepage: http://savage.net.au/Marpa.html.

My homepage: http://savage.net.au/.

Copyright

Australian copyright (c) 2015, Ron Savage.

        All Programs of mine are 'OSI Certified Open Source Software';
        you can redistribute them and/or modify them under the terms of
        The Artistic License 2.0, a copy of which is available at:
        http://opensource.org/licenses/alphabetical.