box / spout Goto Github PK
View Code? Open in Web Editor NEWRead and write spreadsheet files (CSV, XLSX and ODS), in a fast and scalable way
Home Page: http://opensource.box.com/spout/
License: Apache License 2.0
Read and write spreadsheet files (CSV, XLSX and ODS), in a fast and scalable way
Home Page: http://opensource.box.com/spout/
License: Apache License 2.0
I try to read a large excel files (more than 30 columns with column range formulas) and 11 thousands rows. This library can only read 6 columns.
Any clue why ?
When reading a XSLX file the output types are not always what I would expect.
Observations From limited testing with some xslx files from Excel and exported document from Google Sheets give met the same result:
A cell with a Date or DateTime value appears to be converted to a FLOAT (number of days since 1899-12-30 as per Lotus 123 tradition) rather than a PHP DateTime
value.
This is unexpected as Spout/Reader/XLSX/RowIterator.php
explicitly defines cell type CELL_TYPE_DATE
to convert Date values.
Similarly, a numeric value resulting from an evaluated expression like =12+13
is converted to a string value "25"
rather than an Int
or Float
value of 25
Not sure if this is a bug or a limitation related to XSLX variants?
How to set cell background and font color
This is not needed and it's hard to properly maintain
It does not seem to be useful
There seems to be an error in workbook.xml, as I always get a message dialog when opening generated files in MS Excel. Files can be opened but they need to be repaired. Are you aware of the issue?
box/spout 2.3.0
I have empty CSV file with two empty lines and, if I try to read it, it starts infinite loop:
isEmptyLine
getNextUTF8EncodedLine
getEncodedEOLDelimiter
isEmptyLine
getNextUTF8EncodedLine
getEncodedEOLDelimiter
isEmptyLine
getNextUTF8EncodedLine
getEncodedEOLDelimiter
isEmptyLine
getNextUTF8EncodedLine
getEncodedEOLDelimiter
isEmptyLine
getNextUTF8EncodedLine
getEncodedEOLDelimiter
isEmptyLine
...
Box\Spout\Reader\XLSXTest::testReadShouldBeProtectedAgainstBillionLaughsAttack
Even if I build master, this seems to fail to assert for PHP5.6 only... How did TravisCI pass the build, if this always fails?
Hi,
Great script, thanks for sharing your code!
I have a problem while writing XLSX files. Some files doesn't look okay while opening them in open office. The column width is fixed and you can't optimize the width (f.e. double click on the right border of the column head in Open office). The content is aligned to the right and not left. So the data looks okay, but you can't use the file very well. It's very random only some columns behave like that.
If you open the example you will see what I mean.
https://www.dropbox.com/s/1ok7kq3swmboqz4/test_wrong.xlsx?dl=0
Is this a know issue?
PS. Strange is that I can't upload any XLSX file to this form (error: Something went really wrong, and we can’t process that file.), maybe not totally valid?
regards
Olaf
I've copied Spout folder to the server in root directory with following path: Box/Spout
Then declared classes as below:
use Box\Spout\Reader\ReaderFactory;
use Box\Spout\Writer\WriterFactory;
use Box\Spout\Common\Type;
But I'm still not able to use library's classes. My app just doesn't see these classes.
When trying to open the XLSX file with Apple Numbers, the application alerts that the file cannot be opened. No further information is provided. However, changing the file extension to xls let Numers open that file.
Currently, $reader->nextSheet() does not return anything. It would be useful to have it return the current sheet and therefore expose its name.
I am getting following error when system tries to read ziparchive through xmlreader.
XLSX Error: Could not open <file_location> for reading! (Could not open "xl/sharedStrings.xml".)
Error occurs in extractSharedStrings() function in /Reader/Helper/XLSX/SharedStringsHelper.php file in the line where it tries to read the shared string file.
if ($xmlReader->open($sharedStringsFilePath, null....
In my system, I first upload the file in a temporary location (which is /tmp folder) and then give that location to spout class for reading.
We are using Debian Linux configuration. I checked and we have got php_zip, php_xmlreader, and php_simplexml installed. I am using 1.0.10 version of spout class.
Please help.
I would like to give spout a try, but I don't know where to start or what functionality is available. Is there a documentation page?
Thanks
Hi I have a repo for fast-efficient writing to ODS,and I was wondering if it is a file-format you are interested to have as part of spout?
Currently Excel does not display multi-line strings correctly. On Mac, new line characters are replaced with a space while on Windows, it is ignored. Internally though, Excel has the right data (the new line can be seen in the formula bar).
In order to fix the display, the "wrap text" attribute needs to be turned on for these cells. Spout should autodetect new lines and set this attribute automatically.
Hi @adrilo!
How I can set input encoding 'CP1251' for reading file?
For instance, in PHPExcel that can be achieved with the following:
$objReader = new PHPExcel_Reader_CSV();
$objReader->setInputEncoding('CP1251');
When trying to read an XLSX file, using the provided code example, the following warnings/errors are generated (cleaned up a little):
A little investigation revealed that when using PHP zip:// stream wrapper to read a ZIP file, a forward slash (instead of a backward slash, as elsewhere in Windows) must be used when accessing files in subdirectories of the ZIP file. Particular culprit: SharedStringsHelper::getSharedStringsFilePath(), but there are others that fail, too.
I'm not certain yet, if this is the issue of the particular file, or all ZIP (XLSX) files, or my platform (Win64/8.1, Apache/2.4.12 (Win32), PHP/5.6.10, Zip/1.12.5, Libzip/0.11.2). Will investigate a little more.
Hi,
on windows wamp server it throws this error:
XMLReader::open(zip://D:/wamp/www/lumidee/user_uploads/xls_import/create/katalog-lucis.xlsx#/xl/worksheets/sheet1.xml): failed to open stream: operation failed
Can you help me with this error?
stream_get_line($handle, $length, $ending);
if length === 0 then max bytes = 8192
/vendor/box/spout/src/Spout/Reader/CSV/RowIterator.php
line 150
http://joxi.ru/BLmGBg6HboJkrl
25000 - save the situation
I have very long lines
A row must have at least one cell, to avoid a repair error in Excel.
API users may add empty rows as spacers.
See my simple change here:
https://github.com/ocproducts/spout/commit/d2029a084fe45e9902e948fe5f1b6ef0cceec6a8
As it stands the return type of ReaderFactory::create()
method is XLSX
while it can return both CSV
and XLSX
. Additionally XLSX
defines methods (hasNextSheet()
and nextSheet()
) that are not present in neither CSV
nor ReaderInterface
ReaderFactory::create()
to ReaderInterface
ReaderInterface
CSV
class as empty methods.The method implementation (3.) could be made at the AbstractReader
level, but since csv
is likely to be the only supported format that has no sheets it makes more sense to keep the sheet-less implementation at the CSV
level.
Here's an example function that seamlessly process CSV and single-sheet XLSX files:
function processSpreadsheet(\Box\Spout\Reader\ReaderInterface $reader, $filePath)
{
$reader->open($filePath);
if (method_exists($reader, 'hasNextSheet')) { // don't want to check for concrete type in case more formats are implemented
$reader->nextSheet();
}
processRows($reader);
$reader->close();
}
With the described changes the above function can be simplified to this:
function processSpreadsheet(\Box\Spout\Reader\ReaderInterface $reader, $filePath)
{
$reader->open($filePath);
$reader->nextSheet(); // XLSX reader will select the first sheet, CSV reader will do nothing
processRows($reader);
$reader->close();
}
Branching of from discussion in #50
As @adrilo suggested, it could look something like this:
foreach ($reader->getSheetIterator() as $sheet) {
foreach ($sheet->getRowIterator() as $row) {
// do stuff
}
}
Your suggestion was to replace the hasNext
and next
methods entirely, but I am wondering if it would be feasible to have both to maintain backwards compatibility. What should happen if the old methods are called during iteration via iterators?
First of all: I love the simplicity of your class, it works like a charm on my windows system. Unfortunately I get an error when I open the page that generates the file under iOS, I get the following error message:
OfficeImportErrorDomain-Fehler 912
The library implicitly assumes the PHP 'default_charset' option is correctly set for the data passed, as it is calling htmlspecialchars with a charset option, which will prompt conversions.
That's perfectly reasonably, but I think you should document this.
How add a row using addRow and addRows methods?
Hi again,
wondering if having coverage via coveralls.io is something interesting to Box/Spout project?
At least some versions of Excel (testing on Excel Mac 2011) have a bug with inline-strings. If an inline-string has a line break in it, it won't render. I have tried playing with line-ending formats, and it's not related to that. I have checked in a hex editor to confirm the output. If you then subsequently edit the sheet in Excel, Excel shifts things into its shared-string preference, and suddenly it all starts working.
I'm pretty sure this is a flat-out Microsoft bug, with them not using inline-strings natively, and hence not testing them well. It does not affect LibreOffice.
I recommend documenting this, as it impacts users here.
Please can you login to coveralls and change the source folder to src, click a build, select a file, it will say it cannot find, and if you are an owner, you can then supply a prefixed folder ("src" is the value). #60 is reporting lower coverage and I would like to fix that.
Due to use of PHP 'empty'.
Sample fix:
https://github.com/ocproducts/spout/commit/fc85be64a796226aa03ee6aa4ed8b8902d73fe93
XMLReader::next()
skips all subtrees so it should be faster
I keep on getting this error?, why??
I wrote exactly as in the document...
IOException in AbstractReader.php line 90: Could not open lol.xlsx for reading! (XMLReader::open(zip://lol.xlsx#xl\sharedStrings.xml): failed to open stream: operation failed)
First of all, great library! So thanks! :)
I'm reading a CSV with about 25k lines and at the same time writing it into an Excel file.
/** @var Reader $csvReader */
$csvReader = ReaderFactory::create(Type::CSV);
$csvReader->setFieldDelimiter($delimiter);
$csvReader->open($fileName);
/** @var Writer $excelWriter */
$excelWriter = WriterFactory::create(Type::XLSX);
$tmpExportFile = tempnam("/tmp", "export");
$excelWriter->openToFile($tmpExportFile);
foreach ($csvReader->getSheetIterator() as $sheet) {
foreach ($sheet->getRowIterator() as $row) {
$excelWriter->addRow($row);
}
}
$csvReader->close();
$excelWriter->close();
However, it actually gets to closing the writer. I'm not sure how this library works internally (i guess with tmp files).
Any idea how I can process this without increasing the max execution time in php?
Halooo
How to get Columns name and write to particular Column ?
setTempFolder(), setShouldUseInlineStrings() and setShouldCreateNewSheetsAutomatically() must be called before opening the writer, otherwise they have no effect.
Throwing an exception if this is done after would make it explicit
Hi,
I get the following error:
Trying to get property of non-object: /vendor/box/spout/src/Spout/Reader/XLSX.php:227
reading the following Excel with the following code:
$data = array();
$reader = \Box\Spout\Reader\ReaderFactory::create(\Box\Spout\Common\Type::XLSX);
$reader->open($file);
$counter = 0;
$header = array();
while ($reader->hasNextSheet()) {
$reader->nextSheet();
while ($reader->hasNextRow()) {
$counter++;
if ($counter == 1) {
$header = $reader->nextRow();
continue;
}
$data[] = array_combine($header, $reader->nextRow());
}
}
$reader->close();
return $data;
While reading excel sheet we are having a weird issue. We get the excel file from the external system. It runs fine if run unedited but throws following exception when we edit and save the same file. Even if we just add extra space or delete a sheet, the file throws the same exception.
Shared string not found for index: 1432
0 /opt/lampp/htdocs/mr/app/Vendor/Box/Spout/Reader/XLSX.php(247): Box\Spout\Reader\Helper\XLSX\SharedStringsHelper->getStringAtIndex(1432)
1 /opt/lampp/htdocs/mr/app/Vendor/Box/Spout/Reader/AbstractReader.php(131): Box\Spout\Reader\XLSX->read()
It displays random value which is not even in the current sheet. Note that we tested just by running the sample code provided under "How to read a XLSX file?" in documentation.
Any idea what may be going wrong?
I am trying to read a very large excel file, having multiple sheets, in which two or more sheets contain more than 600K records with columns ranging from A to AD.
When I run your sample code of going through each sheet one by one and and iterating through all the rows the program never came back.
I commented all the code and leaving the call to open function, it still takes the same time. Is there a way around of skipping this opening time?
Instead of row based styles it would be nice to have style for each column (which is inserted by row).
I'm wondering is it possible in this lib to find certain cell by some content(text) in it?
I need to find cell this way and add new cell just right after it.
Does this library supports old 2003 office format? if not - would be nice add it
windows wouldnt open paths like:'xl/sharedStrings.xml', for me need to be 'xl\sharedStrings.xml' otherwise it will drop:
XMLReader::open(zip://...20150506052548.xlsx#xl/sharedStrings.xml): failed to open stream: operation failed in ...\vendor\box\spout\src\Spout\Reader\Helper\XLSX\SharedStringsHelper.php on line 97
This is not an issue for me necessarily , just letting you guys know in Worksheet.php right where you begin the sheet (in startSheet()) you can define column widths.
I ended up just prepending the following string to give columns 1 (min) through 7 (max) columns (A:G) with the specified parameters, right before the tag.
Great library, Really prefer it to some of the memory intensive alternatives. If no one has time I can do it myself, but not immediately as I've got a bunch of work to catch up with at the moment. Thanks for your great work. Really came in handy.
I installed spout using composer.
Still having error on windows.
Warning: XMLReader::open(zip://xlsx_files/one_sheet_with_inline_strings.xlsx#xl/sharedStrings.xml): failed to open stream: operation failed in C:\wamp\www\excel_reader_test\vendor\box\spout\src\Spout\Reader\Wrapper\XMLReader.php on line 37
I have tried the suggested fix.
str_replace('/', DIRECTORY_SEPARATOR, $this->filePath)
Still same error except for the path
zip://xlsx_files\one_sheet_with_inline_strings.xlsx#xl/sharedStrings.xml
So I thought of changing the slash on the xml path
$str_xml = str_replace('/', DIRECTORY_SEPARATOR, self::SHARED_STRINGS_XML_FILE_PATH);
return 'zip://' . str_replace('/', DIRECTORY_SEPARATOR, $this->filePath) . '#' . self::SHARED_STRINGS_XML_FILE_PATH;
Same error. So I tried changing the slash in the constant.
const SHARED_STRINGS_XML_FILE_PATH = 'xl/sharedStrings.xml';
I got this error
Warning: file_get_contents(zip://xlsx_files\one_sheet_with_inline_strings.xlsx#[Content_Types].xml): failed to open stream: operation failed in C:\wamp\www\excel_reader_test\vendor\box\spout\src\Spout\Common\Helper\GlobalFunctionsHelper.php on line 167
Here is my setup.
Win 7
Apache 2.4.9
PHP 5.5.12
spout 2.3
PHP Info
libxml 2.9.1
zip 1.11.0
Libzip 0.10.1
zLib 1.2.7.3
Any Help would be greatly appreciated.
I have a proposal [to have an option] to make Spout faster by using SplFixedArray instead of files for storing shared strings. At the moment, switching shared-string cache files is one of the biggest performance hogs. On small files, it's not that big a deal. On big files it becomes painful.
I slapped together a quick spike to check how storing shared strings in files vs. array vs. SPL array compare in an XLSX reading test. Here are the benchmarks on my machine:
Test1.xlsx: 1.28 MB XLSX file, 9884 rows, 1.04 MB sharedStrings.xml
memory_get_peak_usage(), MB | reader->open() time, s | Total read time, s | |
---|---|---|---|
Files | 1.43 | 7.03 | 288.89 |
Array | 4.91 | 3.54 | 22.55 |
SPL array | 2.97 | 3.51 | 22.79 |
Test2.xlsx: 79 MB XLSX file, 859988 rows, 53.5 MB sharedStrings.xml
memory_get_peak_usage(), MB | reader->open() time, s | Total read time, s | |
---|---|---|---|
Files | 1.59 | 290.11 | 20945.36 |
Array | 224.58 | 162.93 | 1300.42 |
SPL array | 137.32 | 174.85 | 1262.81 |
I suggest to have an option in Spout to use in-memory SharedStringsHelper, when needed, which would then use SplFixedArray for storing them.
You can see my changes to the code at kewlar@bc72e2c, if you like. It was meant only for benchmarking the cases, but you can see that the SplFixedArray implementation for shared strings takes little enough code. I didn't figure out how to elegantly tell ReaderFactory (or XLSX) which shared strings helper strategy to use, though.
I understand that one of the goals of Spout was to use as little memory as possible, and I think the way it was accomplished was (and still is) ingenious, and I understand you may want to keep it this way. Still, in some use cases it may be preferable to sacrifice some memory for greater processing speed.
What do you think?
To remove functions from the RowIterator
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.