The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Spreadsheet::Reader::Format::ParseExcelFormatStrings - Convert Excel format strings to code

SYNOPSYS

        #!/usr/bin/env perl
        package MyPackage;
        use Moose;

        use lib '../../../../lib';
        extends 'Spreadsheet::Reader::Format::FmtDefault';
        with    'Spreadsheet::Reader::Format::ParseExcelFormatStrings';

        package main;

        my      $parser                 = MyPackage->new( epoch_year => 1904 );
        my      $conversion     = $parser->parse_excel_format_string( '[$-409]dddd, mmmm dd, yyyy;@' );
        print 'For conversion named: ' . $conversion->name . "\n";
        for my  $unformatted_value ( '7/4/1776 11:00.234 AM', 0.112311 ){
                print "Unformatted value: $unformatted_value\n";
                print "..coerces to: " . $conversion->assert_coerce( $unformatted_value ) . "\n";
        }

        ###########################
        # SYNOPSIS Screen Output
        # 01: For conversion named: DATESTRING
        # 02: Unformatted value: 7/4/1776 11:00.234 AM
        # 03: ..coerces to: Thursday, July 04, 1776
        # 04: Unformatted value: 0.112311
        # 05: ..coerces to: Monday, January 01, 1900
        ###########################

DESCRIPTION

This is the parser that converts Excel custom format strings into code that can be used to transform values into output matching the form defined by the format string. The goal of this code is to support as much as possible the definition of excel custom format strings . If you find cases where this parser and the Excel definition or excecution differ please log a case in github.

This parser converts the format strings to Type::Tiny objects that have the appropriate built in coercions. Any replacement of this engine for use with Spreadsheet::Reader::Format and Spreadsheet::Reader::ExcelXML must output objects that have the methods 'display_name' and 'assert_coerce'. 'display_name' is used by the overall package to determine the cell type and should return a unique name containing an indication of the output data type with either 'DATE' or 'NUMBER' in the name. Otherwise the cell type is assumed to be text. Spreadsheet::Reader::ExcelXML uses 'assert_coerce' as the method to transform the raw value to the formatted value.

Excel format strings can have up to four parts separated by semi-colons. The four parts are positive, zero, negative, and text. In the Excel application the text section is just a pass through. This is how excel handles dates earlier than 1900sh . This parser deviates from that for dates. Since this parser provides code that parses Excel date numbers into a DateTime object (and then potentially back to a differently formatted string) it also attempts to parse strings to DateTime objects if the cell has a date format applied. All other types of Excel number conversions still treat strings as a pass through.

To replace this module just build a Moose::Role that delivers the method parse_excel_format_string and get_defined_conversion. See the documentation for the format interface to integrate into the package.

Caveat Utilitor

The decimal (real number) to fraction conversion implementation here is processing intensive. I erred on the side of accuracy over speed. While I tried my best to provide equivalent accuracy to the Excel output I was unable to duplicate the results in all cases. In those cases this package provides a more precise result than Excel. If you are experiencing delays when reading fraction formatted values then this package is a place to investigate. In order to get the most accurate answer this parser initially uses the continued fraction algorythm to calculate a possible fraction for the pased $decimal value with the setting of 20 max iterations and a maximum denominator width defined by the format string. If that does not resolve satisfactorily it then calculates -all- over/under numerators with decreasing denominators from the maximum denominator (based on the format string) all the way to the denominator of 2 and takes the most accurate result. There is no early-out available in this computation so if you reach this point for multi digit denominators things slow down. (Not that continued fractions are computationally so cheap.) However, dual staging the calculation this way yields either the same result as Excel or a more accurate result while providing a possible early out in the continued fraction portion. I was unable to even come close to Excel output otherwise. If you have a faster conversion or just want to opt out for specific cells without replacing this whole parser then use the worksheet method "set_custom_formats( $key => $format_object_or_string )" in Spreadsheet::Reader::ExcelXML::Worksheet. hint: $format_object_or_string = '@' will set a pass through.

requires

These are method(s) used by this role but not provided by the role. Any class consuming this role will not build without first providing this(ese) methods prior to loading this role.

get_defined_excel_format

Methods

These are the methods provided by this role to whatever class or instance inherits this role. For additional ParseExcelFormatStrings options see the Attributes section.

parse_excel_format_string( $string, $name )

    Definition: This is the method to convert Excel format strings into Type::Tiny objects with built in coercions. The type coercion objects are then used to convert unformatted values into formatted values using the assert_coerce method. Coercions built by this module allow for the format string to have up to four parts separated by semi-colons. These four parts correlate to four different data input ranges. The four parts are positive, zero, negative, and text. If three substrings are sent then the data input is split to (positive and zero), negative, and text. If two input types are sent the data input is split between numbers and text. One input type is a take all comers type with the exception of dates. When dates are built by this module it always adds a possible from-text conversion to process Excel pre-1900ish dates. This is because Excel does not record dates prior to 1900ish as numbers. All date unformatted values are then processed into and then potentially back out of DateTime objects. This requires "Chained Coercions" in Type::Tiny::Manual::Coercions. The two packages used for conversion to DateTime objects are DateTime::Format::Flexible and DateTimeX::Format::Excel.

    Accepts: an Excel number format string and a conversion name stored in the Type::Tiny object. This package will auto-generate a name if none is given

    Returns: a Type::Tiny object with type coercions and pre-filters set for each input type from the formatting string

get_defined_conversion( $position )

Attributes

Data passed to new when creating a class or instance containing this role. For modification of these attributes see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes.

workbook_inst

    Definition: This role works better if it has access to two workbook methods there are defaults built in if the workbook is not connected but the package no longer responds dynamically when that connection is broken. This instance is a way for this role to see those settings.

    Required: No but it's really nice

    Range: an instance of the Spreadsheet::Reader::ExcelXML class

    attribute methods Methods provided to adjust this attribute

      set_workbook_inst( $instance )

        Definition: sets the workbook instance

    delegated methods Methods provided from the object stored in the attribute

            method_name => method_delegated_from_link

cache_formats

    Definition: In order to save re-building the coercion each time they are requested, the built coercions can be cached with the format string as the key. This attribute sets whether caching is turned on or not. In rare cases with lots of unique formats this would allow a reduction in RAM consumtion at the price of speed.

    Range: Boolean

    Default: 1 = caching is on

    attribute methods Methods provided to adjust this attribute

      get_cache_behavior

        Definition: returns the state of the attribute

      set_cache_behavior( $bool )

        Definition: sets the value of the attribute to $Bool

        Range: Boolean 1 = cache formats, 0 = Don't cache formats

datetime_dates

    Definition: It may be that you desire the full DateTime object as output rather than the finalized datestring when converting unformatted date data to formatted date data. This attribute sets whether data coersions are built to do the full conversion or just to a DateTime object in return.

    Default: 0 = unformatted values are coerced completely to date strings (1 = stop at DateTime objects)

    attribute methods Methods provided to adjust this attribute.

      get_date_behavior

        Definition: returns the value of the attribute

      set_date_behavior( $bool )

        Definition: sets the attribute value (only new coercions are affected)

        Accepts: Boolean values

        Delegated to the workbook class: yes

european_first

    Definition: This is a way to check for DD-MM-YY formatting of inbound (read from the file) date stringsprior to checking for MM-DD-YY. Since the package always checks both ways when the number is ambiguous the goal is to catch data where the substring for DD < 13 and assign it correctly.

    Default: 0 = MM-DD-YY[YY] is tested first

    attribute methods Methods provided to adjust this attribute

      get_european_first

        Definition: returns the value of the attribute

      set_european_first( $bool )

        Definition: sets the value of the attribute

        Range: Boolean 0 = MM-DD-YY is tested first, 1 = DD-MM-YY is tested first

SUPPORT

TODO

    1. Attempt to merge _split_decimal_integer and _integer_and_decimal

AUTHOR

Jed Lund
jandrew@cpan.org

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

This software is copyrighted (c) 2016 by Jed Lund

DEPENDENCIES

SEE ALSO