dilshod / xlsx2csv Goto Github PK

View Code? Open in Web Editor NEW

1.6K 1.6K 302.0 597 KB

Convert xslx to csv, it is fast, and works for huge xlsx files

License: MIT License

Python 100.00%

xlsx2csv's People

Contributors

Stargazers

Watchers

Forkers

djmitche jablan hempalex daemon13 dremora foleyx90 jaalto gony55 nelsonko aubinlrx felixr beck917 aforward-oss srmaximiano stephenzl kbhatta outoftheboxx mankyuhan guileen kbroman joshuashen gladiatr72 maverickg rockscience kenahoo respectpotentialenergy pctj101 dw250100785 ui-volodymyr-shaboltas zouf jesseky ngocthanhit marvinlav emirhg whwc rafatower shidh wadebiyi kmila csojinb neeraj-vmware hcw70 luftreich ttabor1977 yalechang hehaotian yongjunjian kempwrites ihuro marcomzp ashleyquinton quyiqy nickdietel umo2000 oddsix luluti caot seanliangyoung doloopwhile bawalker tomtrask nndiaye mukolx oztc rulnick amboxer21 xjzhou mnjstwins bwhiton xtmgah huskyeder alittlesliceoftom prasadgaddai rockykitamura tlevine lucienfostier sebratt liubo404 rod-dot-codes caseycs arbruijn caomw starlein faionsax voonik336 bobo987cro artem-bez theavey benwilson512 cargosense jeffreyvdb prabhatgouda mdekauwe freeart lewis-liu ninoninkovic hskksk eawag-rdm ekwurzele vukio

xlsx2csv's Issues

Invalid multiline string values handling

For this file https://dl.dropboxusercontent.com/u/16118108/01-11-2013-BeCompact-USD%3D321.xlsx i get such output:

"line1: line 1
Line2: line2"

I expect a bit different:

"line1: line 1\nLine2: line2"

May be it will be better to escape newline characters in output?

How large an excel file can it handle ?

Hi, please help, what's this problem ?

Traceback (most recent call last):
File "/usr/local/bin/xlsx2csv", line 550, in
xlsx2csv.convert(outfile, sheetid)
File "/usr/local/bin/xlsx2csv", line 162, in convert
self._convert(sheetid, outfile)
File "/usr/local/bin/xlsx2csv", line 200, in _convert
sheet.to_csv(writer)
File "/usr/local/bin/xlsx2csv", line 362, in to_csv
self.parser.ParseFile(self.filehandle)
File "/usr/local/bin/xlsx2csv", line 377, in handleCharData
xfs_numfmt = self.styles.cellXfs[s]
IndexError: list index out of range

Added outfile.close() to xlsx2csv()

Dilshod,

I added an outfile.close() to the finally statement in xlsx2csv() because garbage collection (I believe) wasn't closing the file quickly enough for my use case. I'm just letting you know so that you can add it to master if you think other's would benefit as well.

Thanks,
Nathan

Shared string data is not resolved correctly in some cases.

In some cases cell data is not parsed correctly.
Investigation and debug output shows that these cells contain shared string and use 2 exec of handleCharData function for each. But when script resolves shared data address to string it uses data of last exec not self.collected_string (or self.data). In result you have almost random data in cell.

Change in 296th line of script
int(data) > int(self.data)

Diff:

diff xlsx2csv.py xlsx2csv_fixed.py
296c296
<                 self.data = self.sharedStrings[int(data)]

---
>                 self.data = self.sharedStrings[int(self.data)]

Looks like this is fix after "Issue #15 (Misinterpretation of &". Data parsing is fixed.

Misinterpretation of &

If there is an ampersand in a cell the tool will split up the cell contents.

Example:
Cell: Hello & World

will be parsed to
Hello
World

I made a temporary hack in my version by:

def handleStartElement(self, name, attrs):
        self.collectedString = ""

and

    def handleCharData(self, data):
        if self.in_cell_value: # defines that there is a value between a set of nodes
            self.collectedString += data
            self.data = self.collectedString # default value

Please add tag "release/N.N.N" in the future

Hi,

It would nice if the repository were tagged on release in order to (1) diff between releases (2) use automated tools to package releases in regular basis (http://githubredir.debian.net/).

Use of "release/" prefix helps to find tags by name easily with git tag -l:

release/1.0
release/1.1
release/1.10
release/1.2

Plese list project at freecode.com

The Freecode is kinda "yellow pages" of Open Source software. It would be useful if the project were listed there; just open an account and you're set.

Once set up, there is nothing to maintain, unless you want to issue release announcements for the subsribers of project watchers.

Date formatting issue (v0.14)

An xlsx file I'm processing contains a date cell that looks like "9/15/2011 15:22" in Excel. I haven't been able to figure out what format to pass to xlsx2csv in order to resolve this.
(1) [no format passed]: m/15/11 h:09
(2) %Y/%m/%d %hh:%mm : 2011/09/15 Seph:09m
(3) %Y/%m/%d %H:%M : 2011/09/15 00:00

The last seem closest, but I would like to see the hours:minutes resolved.

Syntax Error with Python 2.3

Yes, it's ancient Python, but still:

line 780: parser.add_argument("-s", "--sheet", dest="sheetid", default=1, type=int, help="sheet number to convert")

produces a

Traceback (most recent call last):
  File "xlsx2csv.py", line 780, in ?
    help="sheet number to convert")
  File "/usr/lib/python2.3/optparse.py", line 820, in add_option
    option = self.option_class(*args, **kwargs)
  File "/usr/lib/python2.3/optparse.py", line 430, in __init__
    checker(self)
  File "/usr/lib/python2.3/optparse.py", line 499, in _check_type
    raise OptionError("invalid option type: %r" % self.type, self)```

Change it to

line 780: parser.add_argument("-s", "--sheet", dest="sheetid", default=1, type="int", help="sheet number to convert")

and it works. (Note the "" around int.)

Crash on simple file

Excel 2007 file with content
8001990029363
8001990029387
8001990028694
8001990028663
8001990028649
8001990028854
8001990048081
8001990029370
8001990029394
8001990048180
8001990027710

Code:

from openpyxl.reader.excel import load_workbook

book = load_workbook('error.xlsx', use_iterators=True)
sheet = book.get_active_sheet()

for row in sheet.iter_rows():
    print [cell.internal_value for cell in row]

Crash with error:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    for row in sheet.iter_rows():
  File "C:\Python\lib\site-packages\openpyxl\reader\iter_worksheet.py", line 236, in get_squared_range
    cell = cell._replace(internal_value=shared_date.from_julian(float(cell.internal_value)))
  File "C:\Python\lib\site-packages\openpyxl\shared\date_time.py", line 162, in from_julian
    return EPOCH + datetime.timedelta(days=utc_days)
OverflowError: Python int too large to convert to C long

Can't get it to work in Win 7 - Encoding Problem?

I executed in a Windows command shell the following command:

python d:\yy\ww\py\lib\site-packages\xlsx2csv.py "D:\yy\ww\XLS2CSV Converter\source\xxx.xlsx" "D:\yy\xxx\XLS2CSV Converter\distination\vv.csv"

and get the following error message:

Traceback (most recent call last):
File "d:\yy\ww\py\lib\site-packages\xlsx2csv.py", line 847, in
xlsx2csv.convert(outfile, sheetid)
File "d:\yy\ww\py\lib\site-packages\xlsx2csv.py", line 178, in convert
self._convert(sheetid, outfile)
File "d:\yy\ww\py\lib\site-packages\xlsx2csv.py", line 247, in _convert
sheet.to_csv(writer)
File "d:\yy\ww\py\lib\site-packages\xlsx2csv.py", line 558, in to_csv
self.parser.ParseFile(self.filehandle)
File "d:\yy\ww\py\lib\site-packages\xlsx2csv.py", line 706, in handleEndElement
self.writer.writerow(d)
UnicodeEncodeError: 'cp932' codec can't encode character '\xd4' in position 215: illegal multibyte sequence

I'm using Python 3.4.0 and xlsx2csv-0.6.1 installed via pip.
What am I doing wrong?

Won't exclude sheetname "Sheet2" and "Sheet3"

In the excel file "Sheet2" and "Sheet3" are blank so I'd like to exclude them, I keep passing the following command:

xlsx2csv -a -E"Sheet2" -E"Sheet3" file.xlsx

but the produced output still contains delimiters for these two sheets. I've also tried putting a space between the -E option and the names, and I've tried excluding the quotation marks, but to no avail.

Is it possible to add xls support?

xlsx to csv works great but fails with the following for xls files. xls might not even be supported but if you could add xls support, that had be awesome! Thank you!

Traceback (most recent call last):
File "/home/thinkcode/libs/python/xlsx2csv.py", line 344, in ?
xlsx2csv(args[0], f, **kwargs)
File "/home/thinkcode/libs/python/xlsx2csv.py", line 91, in xlsx2csv
ziphandle = zipfile.ZipFile(infilepath)
File "/usr/lib64/python2.4/zipfile.py", line 210, in init
self._GetContents()
File "/usr/lib64/python2.4/zipfile.py", line 230, in _GetContents
self._RealGetContents()
File "/usr/lib64/python2.4/zipfile.py", line 242, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file

Blank cells in XLSX not reflected in CSV

When converting an XLSX file that has emtpy cells at the end of the row, the corresponding row in the CSV is missing the trailing commas needed for using the CSV as an import into a database orr other system that validates the number of expected columns are present.

Ignore <rPh> Element

Excel has capability holds phonetic for cell value as as . element placed inner tag. So, the current implements shows mixed value and phonetic.
I've avoid the problem like below code. I hope apply the code.

def handleStartElement(self, name, attrs):
    if name == 'si':
        self.si = True
        self.value = ""
    elif name == 't' and self.rPh:
        self.t = False
    elif name == 't' and self.si:
        self.t = True
    elif name == 'rPh':
        self.rPh = True

def handleEndElement(self, name):
    if name == 'si':
        self.si = False
        self.strings.append(self.value)
    elif name == 't':
        self.t = False
    elif name == 'rPh':
        self.rPh = False

handleStartElement matching fails and leads to AttributeError

Getting "null-pointer exceptions" from xlsx2csv when parsing some files:

Traceback (most recent call last):
  File "/usr/local/bin/xlsx2csv", line 847, in <module>
    xlsx2csv.convert(outfile, sheetid)
  File "/usr/local/bin/xlsx2csv", line 178, in convert
    self._convert(sheetid, outfile)
  File "/usr/local/bin/xlsx2csv", line 247, in _convert
    sheet.to_csv(writer)
  File "/usr/local/bin/xlsx2csv", line 558, in to_csv
    self.parser.ParseFile(self.filehandle)
  File "/usr/local/bin/xlsx2csv", line 660, in handleStartElement
    startCol = start.group(1)
AttributeError: 'NoneType' object has no attribute 'group'

An example where this problem occurs is here.

According to pip, my currently installed version is the latest, but xlsx2csv -v only outputs the script's name (xslx2csv - want me to open a separate ticket for that?). I see I do have the --merge-cells options available from the script.

IndexError: list index out of range

I have this error :
Traceback (most recent call last):
File "/usr/local/bin/xlsx2csv", line 5, in
pkg_resources.run_script('xlsx2csv==0.6', 'xlsx2csv')
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 499, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1235, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/xlsx2csv-0.6-py2.7.egg/EGG-INFO/scripts/xlsx2csv", line 548, in
xlsx2csv = Xlsx2csv(options.infile, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/xlsx2csv-0.6-py2.7.egg/EGG-INFO/scripts/xlsx2csv", line 155, in init
self.workbook = self._parse(Workbook, "xl/workbook.xml")
File "/usr/local/lib/python2.7/dist-packages/xlsx2csv-0.6-py2.7.egg/EGG-INFO/scripts/xlsx2csv", line 219, in _parse
instance.parse(filehandle)
File "/usr/local/lib/python2.7/dist-packages/xlsx2csv-0.6-py2.7.egg/EGG-INFO/scripts/xlsx2csv", line 239, in parse
sheets = workbookDoc.firstChild.getElementsByTagName("sheets")[0]
IndexError: list index out of range

you can find excel file at :
http://members.tsetmc.com/tsev2/excel/MarketWatchPlus.aspx?d=0

thanks

Converting Apache POI excel files to csv ?

I can't covert xlsx documents from Apache POI . No error messages, just making a csv file that contains row numbers.
My server is a solaris10 server. I need help on this Converting xlsx file from my office 2010 exel works fine

Cell Links lost during conversion

Some of the cells in a xlsx sheet contain links to external websites. When converted all the links are lost and only the text inside the links remain.

test

becomes

test

Is there an option to maybe give output as:
test

Trailing zeros truncated

If a cell contains something like "21.10", xlsx2csv interprets it as a number and truncates the zero. This is bad if the cell is intended to represent a version number or something like that.
These cells are formatted as numbers in Excel, with a fixed number of digits after the decimal point. I have hundreds of files, so I can't change the formats by hand.

The best fix would be to simply retain leading and trailing zeros in the csv.

Another fix would be to interpret numbers with trailing or leading zeros as strings. (maybe with a flag)

Another fix would be a flag to treat all cells as strings, rather than numbers.

Converting all sheets raises 'cmd' not defined

Using with --all or -s 0 causes:

Traceback (most recent call last):
File "/usr/local/bin/xlsx2csv", line 550, in
xlsx2csv.convert(outfile, sheetid)
File "/usr/local/bin/xlsx2csv", line 168, in convert
if cmd:
NameError: global name 'cmd' is not defined

Displayed Value vs Actual Value

Importing xlsx files gets the displayed value (seen on the sheet) but sometimes the "real" value is desired (the value seen in the fx bar when a cell is selected).

Is it possible for this to be an option? Displayed vs actual values depending on a provided option to the command?

PyPI, library

Hi, great tool! Consider adding it to PyPI so people can install it with pip, and also possibly exposing it additionally as a library. Thanks!

issue with empty start row

Hi,

The parser seems to turn into troubles with xlsx files with empty row 1.
It returns:

...
self.colNum = cellId[:len(cellId)-len(self.rowNum)]
TypeError: object of type 'NoneType' has no len()

Without success, I tried to fix it by "ignoring" the cell with NoneType and go on with empty value, but this makes the data in following rows to be missing in the CSV output.

It only happens if the input XLSX has an entirely empty row 1.

Do you have an idea how to deal with this issue?

Thanks for help and thanks for the great .py!

Convert Entire Directory

Hi,

I've got a directory filled with xlsx's that I wanted to convert to csv. Is there any easy way to batch convert?

Formatting minutes

The tool exports minutes < 10 as single characters, for example 14:05 is exported as 14:5. For some routines which import this value, it is converted to 0145, because these only read the digits.

If you can change line 545 to this: self.data = str(t / 60) + ":" + ('0' + str(t % 60))[-2:]
then the minutes will always have two digits, even when 0.

Проблема с страницами в документе

Привет. Имеется проблема с разбором многостраничных документов. Приблизительно проблема заключается в том, что id документа в .../worksheets/sheet*.xml не обязательно совпадает с номером страницы.

ps. Я не Питонист, да и в формате Excel 2007 не разбираюсь, сейчас поковыряю, может смогу разобраться.

Modify README file ; working link to bash script & link to perl package

Hi,

In the Alternatives section, kindly change this entry
Bash:
http://kirk.webfinish.com/2009/12/xlsx2csv/
TO
Bash:
http://kirk.webfinish.com/?p=91 <-- direct link to bash script which works now.

Also, please add the perl alternative too, if possible.
And which is achived like this:
wget http://repo.iotti.biz/CentOS/6/noarch/xls2csv-1.06-16.el6.lux.1.noarch.rpm
yum localinstall xls2csv-1.06-16.el6.lux.1.noarch.rpm -y

Thanks.

Trouble running on Windows 7

I get this error, when running on Windows 7:

C:\Users\jh>python.exe C:\1\xlsx2csv.py
File "C:\1\xlsx2csv.py", line 286
date = datetime.datetime(1904, 01, 01) + datetime.timedelta(float(data))
^
SyntaxError: invalid token

Am I doing it wrong?

Problem with float

I have problem with float.
In xlsx-file i have 0.103 (general or number). After convert in csv i got 0.10299999999999999
The same, 0.276 -> 0.27600000000000002

Throws traceback on some

Hi! Thank you for the tool!
Unfortunately, it throws a traceback, while tries to convert one some files:

Traceback (most recent call last):
  File "./xlsx2csv.py", line 477, in <module>
    xlsx2csv(args[0], None, args[1], **kwargs)
  File "./xlsx2csv.py", line 121, in xlsx2csv
    styles = parse(ziphandle, Styles, "xl/styles.xml")
  File "./xlsx2csv.py", line 179, in parse
    instance.parse(f)
  File "./xlsx2csv.py", line 222, in parse
    numFmtId = int(numFmt._attrs['numFmtId'].value)
AttributeError: Text instance has no attribute '_attrs'

Here is file itself: http://www.sima-land.ru/files-local/price/igrushki.xlsx

Extra Carridge Returns (CR) codes at the end of lines - please strip

Please strip CR (carriage return, \r, 0x0D) from output lines. Now:

$ xlsx2csv test.xlsx | cat -A | head -2
test, 1^M$
test, 2^M$

Something like this to detect if running under modern Windows (pseudo code):

STRIP_OUTPUT = true

# example: ProgramData=C:\ProgramData
if <environment variable "ProgramData" exists>
   AND
   <it matches regexp "[a-z]:\">
then
   STRIP_OUTPUT = false
endif

does not include trailing comma when last column is empty

given the headers A,B,C and data stuff,more stuff, [empty] the csv output doesn't not include a comma after "more stuff".

KeyError: 'appName'

Using the version in pip, I found this error on some of my workbooks:

Traceback (most recent call last):
  File "/usr/local/bin/xlsx2csv", line 548, in <module>
    xlsx2csv = Xlsx2csv(options.infile, **kwargs)
  File "/usr/local/bin/xlsx2csv", line 155, in __init__
    self.workbook = self._parse(Workbook, "xl/workbook.xml")
  File "/usr/local/bin/xlsx2csv", line 219, in _parse
    instance.parse(filehandle)
  File "/usr/local/bin/xlsx2csv", line 233, in parse
    self.appName = workbookDoc.firstChild.getElementsByTagName("fileVersion")[0]._attrs['appName'].value
KeyError: 'appName'

zipfile.BadZipfile: File is not a zip file

I tried to use the script but gave me the above error . find details below :

[xlsx2csv master] $ ./xlsx2csv.py /home/rtcoms/Desktop/bank_wise_csv_ifsc_codes/IFCB2009_02.xls /home/rtcoms/Desktop/code/personal_projects/ifcs_details/tmp/1.csv

Traceback (most recent call last):
File "./xlsx2csv.py", line 443, in
xlsx2csv(args[0], outfile, **kwargs)
File "./xlsx2csv.py", line 115, in xlsx2csv
ziphandle = zipfile.ZipFile(infilepath)
File "/usr/lib/python2.6/zipfile.py", line 696, in init
self._GetContents()
File "/usr/lib/python2.6/zipfile.py", line 716, in _GetContents
self._RealGetContents()
File "/usr/lib/python2.6/zipfile.py", line 728, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file

problem with numbers

Hi,
I am facing a problem with xlsx files that have number such as 9.810070626E+019 (which shows in excel as 98100707530000000000) --> with xlsx2csv this becomes 98100707530000007168 when viewed with text editor. However, with OpenOffice xls-to-csv macro the value is shown as it should as 98100707530000000000.

Both of these are shown correctly when the csv is viewed in e.g. OpenOffice, but I'm taking the csv into different software, e.g. Pentaho Data Integration and there the values show as in a text editor..

Any help appreciated
--patrik

Date formatting ("dates since ...")

First, thank you for this amazing library.

The date formatting seems to be really bad though. It converts "4/5/2014" into "04-05-14". This format makes it very hard to convert back into a date. PHP strtotime() interprets it as May 14, 2004. At the very least, the year should be 4 digits, that would help a lot.

Additionally, Excel sometimes stores dates as simply an integer representing the number of days since January 1, 1900 (or January 1, 1904 on Mac for some reason). It would be excellent if those could also be converted to date strings as well.

floating errors

Excel exported csv Line.
;G3;Dachbox;320;;;;Schwarz Metallic;Arjes;;;133.8;72;37.5;Dach

xlsx2csv exported csv Line
G3Dachbox~~320~~~~Schwarz Metallic~~Arjes~~~133.800000000000017237.5~Dach

Found some more samples in my Files.
Excel field format: General
Original Value: 133.8
CSV Value: 133.80000000000001

I Also found: 7.4999999999999997E-2 instead of 0.075

ValueError: invalid literal for int() with base 10: ''

Got this error today with a xlsx file (tested with 0.6.1):

Traceback (most recent call last):
File "./xlsx2csv.py", line 843, in
xlsx2csv = Xlsx2csv(options.infile, **kwargs)
File "./xlsx2csv.py", line 166, in init
self.workbook = self._parse(Workbook, "xl/workbook.xml")
File "./xlsx2csv.py", line 268, in _parse
instance.parse(filehandle)
File "./xlsx2csv.py", line 314, in parse
if 'r:id' in attrs: id = int(attrs["r:id"].value[3:])
ValueError: invalid literal for int() with base 10: ''

Sometimes a few cells are messed in the attached XLSX

Hi!

Unfortunately sometimes the attached XLSX cannot be converted.

I haved attached the expected output as well - please rename the jpg :-)

NOTE: I'm using the options --sheet=1 --delimiter=x124

Are you able to reproduce the problem and to help me?

THANKS A LOT!!!

ratert

fails on large xlsx files

I am trying to convert some large xlsx files in the range of 60,000 to 85,000 lines.
The program fails with the error:

$ xlsx2csv.py -s 1 60656.xlsx 60656.csv
Traceback (most recent call last):
File "/home/Alex/mybin/xlsx2csv.py", line 412, in
xlsx2csv(args[0], outfile, **kwargs)
File "/home/Alex/mybin/xlsx2csv.py", line 98, in xlsx2csv
workbook = parse(ziphandle, Workbook, "xl/workbook.xml")
File "/home/Alex/mybin/xlsx2csv.py", line 125, in parse
instance.parse(ziphandle.read(filename))
File "/home/Alex/mybin/xlsx2csv.py", line 135, in parse
self.appName = workbookDoc.firstChild.getElementsByTagName("fileVersion")[0]._attrs['appName'].value
IndexError: list index out of range

I am using python 2.7 under linux.
I have tested the script with other xlsx files (smaller) and had no issus.

Thanks
Alex.

Shbang line should use /usr/bin/env python instead of /usr/bin/python

If the correct Python is somewhere else in the path, you can't run xlsx2csv.py directly.

Invalid xlsx file error

I am getting the following error from an excel file I have (Microsoft Excel 97-2004 workbook):

Invalid xlsx file: ...

When I looked at the source code, it seems to be coming from the following line:

        self.ziphandle = zipfile.ZipFile(xlsxfile)

Any idea why I get this error and any work-around?

UnicodeEncodeError when trying to convert xlsx file to csv

I tried : python xlsx2csv.py myfile.xlsx myfile.csv

Complete error message :

Traceback (most recent call last):
File "xlsx2csv.py", line 164, in
xlsx2csv(sys.argv[1], f)
File "xlsx2csv.py", line 22, in xlsx2csv
Sheet(shared_strings, ziphandle.read("xl/worksheets/sheet1.xml"), writer)
File "xlsx2csv.py", line 94, in init
self.parser.Parse(data)
File "xlsx2csv.py", line 154, in handleEndElement
self.writer.writerow(d)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 1: ordinal not in range(128)

dd/mm/yyyy date type missing

So in the UK, where dates are typically of this type, they were converted to numbers (as in the number excel uses to store a date).

To fix this, I added:
'dd/mm/yyyy' : 'date',
on line 49

blank lines in conversion

The conversion from myfile.xslx to csv works fine on Linux but with the same options (defaults ones :o) ) the conversion on windows doubles the number of lines : it adds one blank line every lines !

Do you have an idea of what's wrong in the way i use xlsx2csv ?
Is there any option that may solve this pb ?
Thanks.

only first sheet converted to csv.

I ran xlsx2csv with xlsx that had multiple sheets. But, xlsx2csv converted only one sheet [first sheet] to csv.

Updating links in xlsx sheet before conversion

I am not sure if this is strictly in the purview of this program.

I was trying to use the program to convert an excel sheet A.xlsx which had a link to B.xlsx.
I changed B.xlsx and then ran xlsx2csv.py A.xlsx A.csv

As expected, the change done in B.xlsx did not get reflected in A.csv

But note A.xlsx, it is already set to "Automatically update links"

The only way it works is this way - Open A.xlsx and just do a save. Then do the conversion.

Is there a way in python to force an xlsx to save itself (at command prompt) - before doing the conversion.

All this is being done in a windows machine - but I guess any trick available, I should be able to map from linux to windows.

Kind Regards

inlineStr support

I ran across a xlsx file using the inlineStr method instead of the more common sharedStrings approach. The conversion failed because the inlineStr are stored in a tag instead of the normal tag.

Adding this to the if then block in the handleStartElement method allows for proper conversion of inlineStr. Please test and add to next version:

  elif self.in_cell and name == 'is':
        self.in_cell_value = True