Giter VIP home page Giter VIP logo

spout's People

Contributors

adrilo avatar alamirault avatar andrew-demb avatar camuthig avatar carusogabriel avatar drowe-wayfair avatar eisberg avatar eusonlito avatar garak avatar glukkkk avatar guss77 avatar jmsche avatar joergmoenke avatar jtreminio avatar kingmar avatar larsbonczek avatar lewiscowles1986 avatar lichunqiang avatar madflow avatar nimmneun avatar philipbrown avatar rodrigowbazevedo avatar sfichera avatar stianlik avatar tobya avatar troytft avatar websourcecz avatar welcomattic avatar yannickroger avatar yiranzai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spout's Issues

Can only read few columns

I try to read a large excel files (more than 30 columns with column range formulas) and 11 thousands rows. This library can only read 6 columns.

Any clue why ?

No "date" type when reading XSLX

When reading a XSLX file the output types are not always what I would expect.

Observations From limited testing with some xslx files from Excel and exported document from Google Sheets give met the same result:

A cell with a Date or DateTime value appears to be converted to a FLOAT (number of days since 1899-12-30 as per Lotus 123 tradition) rather than a PHP DateTime value.

This is unexpected as Spout/Reader/XLSX/RowIterator.php explicitly defines cell type CELL_TYPE_DATE to convert Date values.

Similarly, a numeric value resulting from an evaluated expression like =12+13 is converted to a string value "25" rather than an Int or Float value of 25

Not sure if this is a bug or a limitation related to XSLX variants?

Infinite loop if csv is empty

box/spout 2.3.0
I have empty CSV file with two empty lines and, if I try to read it, it starts infinite loop:
isEmptyLine
getNextUTF8EncodedLine
getEncodedEOLDelimiter
isEmptyLine
getNextUTF8EncodedLine
getEncodedEOLDelimiter
isEmptyLine
getNextUTF8EncodedLine
getEncodedEOLDelimiter
isEmptyLine
getNextUTF8EncodedLine
getEncodedEOLDelimiter
isEmptyLine
getNextUTF8EncodedLine
getEncodedEOLDelimiter
isEmptyLine
...

Issue when building (intermittent)

Box\Spout\Reader\XLSXTest::testReadShouldBeProtectedAgainstBillionLaughsAttack

Even if I build master, this seems to fail to assert for PHP5.6 only... How did TravisCI pass the build, if this always fails?

Writing XLSX format and column/alignment problem

Hi,

Great script, thanks for sharing your code!
I have a problem while writing XLSX files. Some files doesn't look okay while opening them in open office. The column width is fixed and you can't optimize the width (f.e. double click on the right border of the column head in Open office). The content is aligned to the right and not left. So the data looks okay, but you can't use the file very well. It's very random only some columns behave like that.
If you open the example you will see what I mean.

https://www.dropbox.com/s/1ok7kq3swmboqz4/test_wrong.xlsx?dl=0

Is this a know issue?

PS. Strange is that I can't upload any XLSX file to this form (error: Something went really wrong, and we can’t process that file.), maybe not totally valid?

regards
Olaf

Issue with installing library manually

I've copied Spout folder to the server in root directory with following path: Box/Spout

Then declared classes as below:

use Box\Spout\Reader\ReaderFactory;
use Box\Spout\Writer\WriterFactory;
use Box\Spout\Common\Type;

But I'm still not able to use library's classes. My app just doesn't see these classes.

XLSX files not opening in Apple Numbers

When trying to open the XLSX file with Apple Numbers, the application alerts that the file cannot be opened. No further information is provided. However, changing the file extension to xls let Numers open that file.

error in reading ziparchive through xmlreader

I am getting following error when system tries to read ziparchive through xmlreader.

XLSX Error: Could not open <file_location> for reading! (Could not open "xl/sharedStrings.xml".)

Error occurs in extractSharedStrings() function in /Reader/Helper/XLSX/SharedStringsHelper.php file in the line where it tries to read the shared string file.

if ($xmlReader->open($sharedStringsFilePath, null....

In my system, I first upload the file in a temporary location (which is /tmp folder) and then give that location to spout class for reading.

We are using Debian Linux configuration. I checked and we have got php_zip, php_xmlreader, and php_simplexml installed. I am using 1.0.10 version of spout class.

Please help.

Spout Documentation

I would like to give spout a try, but I don't know where to start or what functionality is available. Is there a documentation page?

Thanks

ODS output

Hi I have a repo for fast-efficient writing to ODS,and I was wondering if it is a file-format you are interested to have as part of spout?

Set cell style to "wrap text" when contains multi-line string

Currently Excel does not display multi-line strings correctly. On Mac, new line characters are replaced with a space while on Windows, it is ignored. Internally though, Excel has the right data (the new line can be seen in the formula bar).

In order to fix the display, the "wrap text" attribute needs to be turned on for these cells. Spout should autodetect new lines and set this attribute automatically.

How to set input encoding for Reader?

Hi @adrilo!

How I can set input encoding 'CP1251' for reading file?

For instance, in PHPExcel that can be achieved with the following:

$objReader = new PHPExcel_Reader_CSV();
$objReader->setInputEncoding('CP1251');

Cannot open XLSX files on Windows - forward slashes must be used inside of the ZIP file, instead of DIRECTORY_SEPARATOR.

When trying to read an XLSX file, using the provided code example, the following warnings/errors are generated (cleaned up a little):

  • Warning: XMLReader::open(zip://D:\www\spikes\xlsx-reading/data/smm.xlsx#xl\sharedStrings.xml): failed to open stream: operation failed in (SharedStringsHelper.php:115)
  • Warning: XMLReader::open(): Unable to open source data in (SharedStringsHelper.php:115
  • Fatal error: Uncaught exception 'Box\Spout\Common\Exception\IOException' with message 'Could not open D:\www\spikes\xlsx-reading/data/smm.xlsx for reading! (Could not open "xl/sharedStrings.xml".)' in AbstractReader.php:90

A little investigation revealed that when using PHP zip:// stream wrapper to read a ZIP file, a forward slash (instead of a backward slash, as elsewhere in Windows) must be used when accessing files in subdirectories of the ZIP file. Particular culprit: SharedStringsHelper::getSharedStringsFilePath(), but there are others that fail, too.

I'm not certain yet, if this is the issue of the particular file, or all ZIP (XLSX) files, or my platform (Win64/8.1, Apache/2.4.12 (Win32), PHP/5.6.10, Zip/1.12.5, Libzip/0.11.2). Will investigate a little more.

Windows wamp server problem

Hi,

on windows wamp server it throws this error:

XMLReader::open(zip://D:/wamp/www/lumidee/user_uploads/xls_import/create/katalog-lucis.xlsx#/xl/worksheets/sheet1.xml): failed to open stream: operation failed

Can you help me with this error?

Inconsistent interface

Problem

As it stands the return type of ReaderFactory::create() method is XLSX while it can return both CSV and XLSX. Additionally XLSX defines methods (hasNextSheet() and nextSheet()) that are not present in neither CSV nor ReaderInterface

Proposed Solution

  1. Change the return type of ReaderFactory::create() to ReaderInterface
  2. Add the missing methods to the ReaderInterface
  3. Implement the new interface methods in the CSV class as empty methods.

The method implementation (3.) could be made at the AbstractReader level, but since csv is likely to be the only supported format that has no sheets it makes more sense to keep the sheet-less implementation at the CSV level.

Usage Example

Here's an example function that seamlessly process CSV and single-sheet XLSX files:

function processSpreadsheet(\Box\Spout\Reader\ReaderInterface $reader, $filePath)
{
    $reader->open($filePath);
    if (method_exists($reader, 'hasNextSheet')) { // don't want to check for concrete type in case more formats are implemented
        $reader->nextSheet();
    }

    processRows($reader);

    $reader->close();
}

With the described changes the above function can be simplified to this:

function processSpreadsheet(\Box\Spout\Reader\ReaderInterface $reader, $filePath)
{
    $reader->open($filePath);
    $reader->nextSheet(); // XLSX reader will select the first sheet, CSV reader will do nothing

    processRows($reader);

    $reader->close();
}

Iterator support

Branching of from discussion in #50

As @adrilo suggested, it could look something like this:

foreach ($reader->getSheetIterator() as $sheet) {
    foreach ($sheet->getRowIterator() as $row) {
       // do stuff
    }
}

Your suggestion was to replace the hasNext and next methods entirely, but I am wondering if it would be feasible to have both to maintain backwards compatibility. What should happen if the old methods are called during iteration via iterators?

Unable to open xlsx in browser under iOS

First of all: I love the simplicity of your class, it works like a charm on my windows system. Unfortunately I get an error when I open the page that generates the file under iOS, I get the following error message:

OfficeImportErrorDomain-Fehler 912

Document character set assumption

The library implicitly assumes the PHP 'default_charset' option is correctly set for the data passed, as it is calling htmlspecialchars with a charset option, which will prompt conversions.
That's perfectly reasonably, but I think you should document this.

Unit Test Coverage?

Hi again,

wondering if having coverage via coveralls.io is something interesting to Box/Spout project?

Recommend to use shared-strings when writing

At least some versions of Excel (testing on Excel Mac 2011) have a bug with inline-strings. If an inline-string has a line break in it, it won't render. I have tried playing with line-ending formats, and it's not related to that. I have checked in a hex editor to confirm the output. If you then subsequently edit the sheet in Excel, Excel shifts things into its shared-string preference, and suddenly it all starts working.

I'm pretty sure this is a flat-out Microsoft bug, with them not using inline-strings natively, and hence not testing them well. It does not affect LibreOffice.

I recommend documenting this, as it impacts users here.

Coveralls file source not visible

Please can you login to coveralls and change the source folder to src, click a build, select a file, it will say it cannot find, and if you are an owner, you can then supply a prefixed folder ("src" is the value). #60 is reporting lower coverage and I would like to fix that.

IOException, failed to open stream: operation failed

I keep on getting this error?, why??
I wrote exactly as in the document...

IOException in AbstractReader.php line 90: Could not open lol.xlsx for reading! (XMLReader::open(zip://lol.xlsx#xl\sharedStrings.xml): failed to open stream: operation failed)

ZipHelper close hits max execution time

First of all, great library! So thanks! :)

I'm reading a CSV with about 25k lines and at the same time writing it into an Excel file.

        /** @var Reader $csvReader */
        $csvReader = ReaderFactory::create(Type::CSV);
        $csvReader->setFieldDelimiter($delimiter);
        $csvReader->open($fileName);

        /** @var Writer $excelWriter */
        $excelWriter = WriterFactory::create(Type::XLSX);
        $tmpExportFile = tempnam("/tmp", "export");
        $excelWriter->openToFile($tmpExportFile);

        foreach ($csvReader->getSheetIterator() as $sheet) {
            foreach ($sheet->getRowIterator() as $row) {
                $excelWriter->addRow($row);
            }
        }

        $csvReader->close();
        $excelWriter->close();

However, it actually gets to closing the writer. I'm not sure how this library works internally (i guess with tmp files).

screen shot 2015-09-30 at 10 57 59

Any idea how I can process this without increasing the max execution time in php?

Trying to get property of non-object: /vendor/box/spout/src/Spout/Reader/XLSX.php:227

Hi,

I get the following error:
Trying to get property of non-object: /vendor/box/spout/src/Spout/Reader/XLSX.php:227
reading the following Excel with the following code:

$data = array();
        $reader = \Box\Spout\Reader\ReaderFactory::create(\Box\Spout\Common\Type::XLSX);
        $reader->open($file);
        $counter = 0;
        $header = array();
        while ($reader->hasNextSheet()) {
            $reader->nextSheet();
            while ($reader->hasNextRow()) {
                $counter++;
                if ($counter == 1) {
                    $header = $reader->nextRow();
                    continue;
                }
                $data[] = array_combine($header, $reader->nextRow());
            }
        }
        $reader->close();
        return $data;

2015-06-03 14_47_32-microsoft excel - template xlsx

Error: Shared string not found for index

While reading excel sheet we are having a weird issue. We get the excel file from the external system. It runs fine if run unedited but throws following exception when we edit and save the same file. Even if we just add extra space or delete a sheet, the file throws the same exception.

Shared string not found for index: 1432

0 /opt/lampp/htdocs/mr/app/Vendor/Box/Spout/Reader/XLSX.php(247): Box\Spout\Reader\Helper\XLSX\SharedStringsHelper->getStringAtIndex(1432)

1 /opt/lampp/htdocs/mr/app/Vendor/Box/Spout/Reader/AbstractReader.php(131): Box\Spout\Reader\XLSX->read()

It displays random value which is not even in the current sheet. Note that we tested just by running the sample code provided under "How to read a XLSX file?" in documentation.

Any idea what may be going wrong?

$reader->open($filename) takes all the time

I am trying to read a very large excel file, having multiple sheets, in which two or more sheets contain more than 600K records with columns ranging from A to AD.

When I run your sample code of going through each sheet one by one and and iterating through all the rows the program never came back.

I commented all the code and leaving the call to open function, it still takes the same time. Is there a way around of skipping this opening time?

Xls support

Does this library supports old 2003 office format? if not - would be nice add it

ZipArchive and Windows

windows wouldnt open paths like:'xl/sharedStrings.xml', for me need to be 'xl\sharedStrings.xml' otherwise it will drop:

XMLReader::open(zip://...20150506052548.xlsx#xl/sharedStrings.xml): failed to open stream: operation failed in ...\vendor\box\spout\src\Spout\Reader\Helper\XLSX\SharedStringsHelper.php on line 97

Add Column Widths

This is not an issue for me necessarily , just letting you guys know in Worksheet.php right where you begin the sheet (in startSheet()) you can define column widths.

I ended up just prepending the following string to give columns 1 (min) through 7 (max) columns (A:G) with the specified parameters, right before the tag.

Great library, Really prefer it to some of the memory intensive alternatives. If no one has time I can do it myself, but not immediately as I've got a bunch of work to catch up with at the moment. Thanks for your great work. Really came in handy.

WAMP and Spout 2.8 XMLReader::open Issue

I installed spout using composer.

Still having error on windows.
Warning: XMLReader::open(zip://xlsx_files/one_sheet_with_inline_strings.xlsx#xl/sharedStrings.xml): failed to open stream: operation failed in C:\wamp\www\excel_reader_test\vendor\box\spout\src\Spout\Reader\Wrapper\XMLReader.php on line 37

I have tried the suggested fix.
str_replace('/', DIRECTORY_SEPARATOR, $this->filePath)

Still same error except for the path
zip://xlsx_files\one_sheet_with_inline_strings.xlsx#xl/sharedStrings.xml

So I thought of changing the slash on the xml path
$str_xml = str_replace('/', DIRECTORY_SEPARATOR, self::SHARED_STRINGS_XML_FILE_PATH);
return 'zip://' . str_replace('/', DIRECTORY_SEPARATOR, $this->filePath) . '#' . self::SHARED_STRINGS_XML_FILE_PATH;

Same error. So I tried changing the slash in the constant.
const SHARED_STRINGS_XML_FILE_PATH = 'xl/sharedStrings.xml';

I got this error
Warning: file_get_contents(zip://xlsx_files\one_sheet_with_inline_strings.xlsx#[Content_Types].xml): failed to open stream: operation failed in C:\wamp\www\excel_reader_test\vendor\box\spout\src\Spout\Common\Helper\GlobalFunctionsHelper.php on line 167

Here is my setup.
Win 7
Apache 2.4.9
PHP 5.5.12
spout 2.3

PHP Info
libxml 2.9.1
zip 1.11.0
Libzip 0.10.1
zLib 1.2.7.3

Any Help would be greatly appreciated.

Option to use SplFixedArray for shared strings, instead of file-based cache.

I have a proposal [to have an option] to make Spout faster by using SplFixedArray instead of files for storing shared strings. At the moment, switching shared-string cache files is one of the biggest performance hogs. On small files, it's not that big a deal. On big files it becomes painful.

I slapped together a quick spike to check how storing shared strings in files vs. array vs. SPL array compare in an XLSX reading test. Here are the benchmarks on my machine:

Test1.xlsx: 1.28 MB XLSX file, 9884 rows, 1.04 MB sharedStrings.xml

memory_get_peak_usage(), MB reader->open() time, s Total read time, s
Files 1.43 7.03 288.89
Array 4.91 3.54 22.55
SPL array 2.97 3.51 22.79

Test2.xlsx: 79 MB XLSX file, 859988 rows, 53.5 MB sharedStrings.xml

memory_get_peak_usage(), MB reader->open() time, s Total read time, s
Files 1.59 290.11 20945.36
Array 224.58 162.93 1300.42
SPL array 137.32 174.85 1262.81

I suggest to have an option in Spout to use in-memory SharedStringsHelper, when needed, which would then use SplFixedArray for storing them.

You can see my changes to the code at kewlar@bc72e2c, if you like. It was meant only for benchmarking the cases, but you can see that the SplFixedArray implementation for shared strings takes little enough code. I didn't figure out how to elegantly tell ReaderFactory (or XLSX) which shared strings helper strategy to use, though.

I understand that one of the goals of Spout was to use as little memory as possible, and I think the way it was accomplished was (and still is) ingenious, and I understand you may want to keep it this way. Still, in some use cases it may be preferable to sacrifice some memory for greater processing speed.

What do you think?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.