Giter VIP home page Giter VIP logo

excelcompare's Introduction

Introduction

ExcelCompare is a command line tool (coming soon API) to diff Excel / Open document (ods) (Open office, Libre office) spreadsheets.

It uses the Apache POI library to read Excel files. And the [OdfToolkit] (http://incubator.apache.org/odftoolkit) library to read Open document (ods) files.

This software is distributed under the MIT license.

Features / Limitations

  • Identifies extra cells / sheets in addition to common cells.
  • Prints all diffs & extra cells on stdout.
  • Process exits with 0 if workbooks match, 1 otherwise.
  • Works with xls, xlsx, xlsm, ods. You may compare any of these with each other.
  • Compares only cell "contents". Formatting, macros are currently ignored.
  • Using --ignore1 & --ignore2 (both optional) you may tell the diff to skip any number of sheets / rows / columns / cells.
  • Other flags to control diffing (see below for description of these): --diff_numeric_precision, --diff_ignore_formulas, --diff_format.

Report bugs / issues / requests here

Installation

Prerequisites

  • Requires Java 1.8 or higher.
  • Assumes Java is added to PATH (to check open a cmd and run java -version)
  • No other platform specific requirements
  • A shell script and a bat script are packaged

Just download the zip file.

Extract it anywhere (and optionally you add the extracted folder to PATH).

macOS

Homebrew makes it easy to install ExcelCompare:

$ brew update
$ brew install excel-compare

Usage

$ excel_cmp <diff-flags> <file1> <file2> [--ignore1 <sheet-ignore-spec> ..] [--ignore2 <sheet-ignore-spec> ..]

Notes:

  • --ignore1 (file1) and --ignore2 (file2) are independent of each other
  • Give one and only one <sheet-ignore-spec> per sheet
  • File path is assumed relative to current directory unless full path is provided
  • file1 and/or file2 can be '/dev/null' or '\.\NUL' (on windows) which is treated as empty file. This is useful for using ExcelCompare for git diff.

Diff flags

  • --diff_numeric_precision: by default numbers are diffed with double precision, to change that specify this flag as --diff_numeric_precision=0.0001
  • --diff_ignore_formulas: by default for cells with formula, formula is compared instead of the evaluated value. Use this flag to compare evaluated value instead
  • --diff_format: by default output is in 'excel_cmp' format, use --diff_format=unified to output in Unified Diff format instead

Sheet Ignore Spec

<sheet-name>:<row-ignore-spec>:<column-ignore-spec>:<cell-ignore-spec>
  • Leaving <sheet-name> blank corresponds to this spec applying to all sheets, for example ::A will ignore column A in all sheets
  • To ignore whole sheet, just provide <sheet-name>
  • Any cell satisfying any ignore spec in the sheet (row, col, or cell) will be ignored in diff
  • You may provide only <cell-ignore-spec> as - <sheet-name>:::<cell-ignore-spec>

Row Ignore Spec

<comma sep list of row or row-range>
  • Row numbers begin from 1
  • Range of rows may be provide as: 1-10
  • Rows and ranges may be mixed as: 1-10,12,20-30 etc.

Column Ignore Spec

<comma sep list of column or column-range>
  • Similar to Row Ignore Spec
  • Columns are letters starting with A

Cell Ignore Spec

<comma sep list of cell or cell-range>
  • Similar to Row Ignore Spec
  • Cells are in usual Excel notation A1 D10
  • Range may be provided as A1-D10

Examples

  • Diff all cells

      excel_cmp 1.xlsx 2.xlsx
    
  • Ignore Sheet1 in 1.xlsx

      excel_cmp 1.xlsx 2.xlsx --ignore1 Sheet1
    
  • Ignore Sheet1 in both

      excel_cmp 1.xlsx 2.xlsx --ignore1 Sheet1 --ignore2 Sheet1
    
  • Ignore column A in both

      excel_cmp 1.xlsx 2.xlsx --ignore1 Sheet1::A --ignore2 Sheet1::A
    
  • Ignore column A across all sheets in both

      excel_cmp 1.xlsx 2.xlsx --ignore1 ::A --ignore2 ::A
    
  • Ignore columns A,D and rows 1-5, 20-25

      excel_cmp 1.xlsx 2.xlsx --ignore1 Sheet1:1-5,20-25:A,D --ignore2 Sheet1:1-5,20-25:A,D
    
  • Ignore columns A,D and rows 1-5, 20-25 and cells F6,H8

      excel_cmp 1.xlsx 2.xlsx --ignore1 Sheet1:1-5,20-25:A,D:F6,H8 --ignore2 Sheet1:1-5,20-25:A,D:F6,H8
    
  • Ignore column A in Sheet1 and column B in Sheet2

      excel_cmp 1.xlsx 2.xlsx --ignore1 Sheet1::A Sheet2::B --ignore2 Sheet1::A Sheet2::B
    
  • Ignore cells A1-B10 in Sheet2 of both files

      excel_cmp 1.xlsx 2.xlsx --ignore1 Sheet2:::A1-B10 --ignore2 Sheet2:::A1-B10
    
  • Ignore column A in all sheets of both files

      excel_cmp 1.xlsx 2.xlsx --ignore1 ::A --ignore2 ::A
    

Native ("excel_cmp") Output format

  • Each diff or extra cell is reported per line as follows

      DIFF  Cell at      <Cell> => <Value1> v/s <Value2>
      EXTRA Cell in <WB> <Cell> => <Value>
    
  • Then a summary

      ----------------- DIFF -------------------
      Sheets: [<Set of sheets with diffs>]
      Rows: [<Set of rows with diffs>]
      Cols: [<Set of columns with diffs>]
      ----------------- EXTRA WB1 -------------------
      Sheets: [<Set of extra sheets in WB1>]
      Rows: [<Set of extra rows in WB1>]
      Cols: [<Set of extra columns in WB1>]
      ----------------- EXTRA WB2 -------------------
      Sheets: [<Set of extra sheets in WB2>]
      Rows: [<Set of extra rows in WB2>]
      Cols: [<Set of extra columns in WB2>]
      -----------------------------------------
    
  • Then one line

      Excel files <file1> and <file2> differ|match
    

Examples

  • Diffs in cells and extra cells
> excel_cmp xxx.xlsx yyy.xlsx
DIFF  Cell at     Sheet1!A1 => 'a' v/s 'aa'
EXTRA Cell in WB1 Sheet1!B1 => 'cc'
DIFF  Cell at     Sheet1!D4 => '4.0' v/s '14.0'
EXTRA Cell in WB2 Sheet1!J10 => 'j'
EXTRA Cell in WB1 Sheet1!K11 => 'k'
EXTRA Cell in WB1 Sheet2!A1 => 'abc'
EXTRA Cell in WB2 Sheet3!A1 => 'haha'
----------------- DIFF -------------------
Sheets: [Sheet1]
Rows: [1, 4]
Cols: [A, D]
----------------- EXTRA WB1 -------------------
Sheets: [Sheet1, Sheet2]
Rows: [1, 11]
Cols: [B, K, A]
----------------- EXTRA WB2 -------------------
Sheets: [Sheet1, Sheet3]
Rows: [10, 1]
Cols: [J, A]
-----------------------------------------
Excel files xxx.xlsx and yyy.xlsx differ
  • Only extra cells
excel_cmp xxx.xlsx yyy.xlsx --ignore1 Sheet1 --ignore2 Sheet1
EXTRA Cell in WB1 Sheet2!A1 => 'abc'
EXTRA Cell in WB2 Sheet3!A1 => 'haha'
----------------- DIFF -------------------
Sheets: []
Rows: []
Cols: []
----------------- EXTRA WB1 -------------------
Sheets: [Sheet2]
Rows: [1]
Cols: [A]
----------------- EXTRA WB2 -------------------
Sheets: [Sheet3]
Rows: [1]
Cols: [A]
-----------------------------------------
Excel files xxx.xlsx and yyy.xlsx differ
  • No diff
excel_cmp xxx.xlsx yyy.xlsx --ignore1 Sheet1 Sheet2 Sheet3 --ignore2 Sheet1 Sheet2 Sheet3
----------------- DIFF -------------------
Sheets: []
Rows: []
Cols: []
----------------- EXTRA WB1 -------------------
Sheets: []
Rows: []
Cols: []
----------------- EXTRA WB2 -------------------
Sheets: []
Rows: []
Cols: []
-----------------------------------------
Excel files xxx.xlsx and yyy.xlsx match

Unified Diff output format

  • Diffs are reported in the "unified diff" style, with no surrounding context (i.e., a la diff -U0).
  • Each sheet containing a diff or an extra cell begins with a header as follows: --! ++!
  • Each row containing a diff or an extra cell begins with a line that identifies a contiguous series of cells in the row as follows: @@ , , @@
  • Each diff or extra cell is reported as follows: - -... - + +... +
  • If there are multiple series of diff or extra cells, the row header and cell data will be repeated, with the column numbers idetifying the start and end of each series.
  • There is no summary, and if there are no diffs and no extra cells, the output is empty.

Examples

  • Diffs in cells and extra cells
> excel_cmp --diff-format=unified xxx.xlsx yyy.xlsx
--- xxx.xlsx!Sheet1
+++ yyy.xlsx!Sheet1
@@ A1,B1 A1,B1 @@
-a
-cc
+aa
+
@@ D4 D4 @@
-4.0
+14.0
@@ J10 J10 @@
-
+j
@@ K11 K11 @@
-k
+
--- xxx.xlsx!Sheet2
+++ yyy.xlsx!Sheet2
@@ A1 A1 @@
-abc
+
--- xxx.xlsx!Sheet3
+++ yyy.xlsx!Sheet3
@@ A1 A1 @@
-
+haha
  • Only extra cells
> excel_cmp --diff-format=unified xxx.xlsx yyy.xlsx --ignore1 Sheet1 --ignore2 Sheet1
--- xxx.xlsx!Sheet2
+++ yyy.xlsx!Sheet2
@@ A1 A1 @@
-abc
+
--- xxx.xlsx!Sheet3
+++ yyy.xlsx!Sheet3
@@ A1 A1 @@
-
+haha
  • No diff
> excel_cmp --diff-format=unified xxx.xlsx yyy.xlsx --ignore1 Sheet1 Sheet2 Sheet3 --ignore2 Sheet1 Sheet2 Sheet3

excelcompare's People

Contributors

da006ag avatar jawshooah avatar na-ka-na avatar nehaa avatar rosspatterson avatar sclegg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

excelcompare's Issues

How to compare only Excel values without formulas?

Is there a way to compare only the shown values, without comparing the formulas behind them?
In our application the formulas change but not the values. It would be great if there was an ignore option for formulas.

Cannot process very large workbooks

I have a workbook which is approximately 6.5 MB. Unfortunately, the JRE runs out of memory every time I run ExcelCompare on two similar versions of this workbook (for example only one cell added). I tried increasing the JRE heap size to 1500m (I couldn't go much higher) without success. The Excel Compare program works great for smaller workbooks.

Would it be possible to accommodate larger workbooks? Is there some settings that I could use to reduce memory usage and perhaps only parse one part of the workbook at a time?

I know this is a fairly extreme case so I am not too concerned if a solution is not feasible.

com/ka/excelcmp/ExcelCompare : Unsupported major.minor version 51.0

kahmed@cr23788-kahmed:~/EXCEL-COMPARE/ExcelCompare-0.2$ bin/excel_cmp
Exception in thread "main" java.lang.UnsupportedClassVersionError: com/ka/excelcmp/ExcelCompare : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: com.ka.excelcmp.ExcelCompare. Program will exit.

Error: Could not find or load main class com.ka.spreadsheet.diff.SpreadSheetDiffer

I am running windows 7, with the following java version:
Java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b18)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

And when I try to run the following command(under script folder):
excel_cmp 1.xlsx 2.xlsx (Where 1.xlsx and 2.xlsx are two excel files that I copied inside script folder)

I keep getting an error:
Error: Could not find or load main class com.ka.spreadsheet.diff.SpreadSheetDiffer

Any help would be really appreciated!!

Feature request: Support for column headers

I have two spreadsheets with identical column headers in the first row, but different data. It would be good to show the column header in the comparison output, rathe than the cell reference ("B2")

Feature request: Compare by SheetName!RangeName

Is it possible to add a new feature to compare two workbooks and its worksheets based on the range names instead of using cell based reference. Also to use the range names if available when reporting extra and diff. Couple of reasons for this request 1) Currently, if someone enters a new row in the beginning of the worksheet and then compare then none of the cells match 2) When using range names the cell could actually be located in different cells and not be tied to a particular location 3) Make comparing easier when using range names than cell references.

STDOut could be easier to read

I love the tool, I would like it even more if she output would be a bit more readable. Currently it is quite hard to spot differences at first glance, especially with multiline cells.

Need to give the JVM more memory to prevent OutOfMemoryError at runtime

I was diff'ing a 2000 line, 25 column spreadsheet and and got:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

I fixed it by altering the excel_cmp.bat file to give the java virtual machine more memory by adding -Xmx1024m to the options passed to the java command:
java -ea -Xmx1024m -cp "%dirname%\dist\*;" com.ka.spreadsheet.diff.SpreadSheetDiffer %*

Can I suggest this is rolled into your next update to cover this problem for others?

Error on cygwin: Could not find or load main class com.ka.spreadsheet.diff.SpreadSheetDiffer

I am having an issue using the shell script version of the tool.
I tried the following test, and it was able to run correctly.

$ cd path\to\excelcompare\bin
$ java -ea -Xmx512m -cp "dist/*" com.ka.spreadsheet.diff.SpreadSheetDiffer 1.xlsx and 2.xlsx

However when I execute from outside excelcompare/bin directory, I get the error message in the title of this issue.

Here are a few of my system settings:
$ java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) Client VM (build 25.101-b13, mixed mode)

$ echo $PATH
/cygdrive/c/ExcelCompare-0.6.0/bin/:/cygdrive/c/ExcelCompare-0.6.0/bin/dist/:/cygdrive/c/Program Files/Java/jre1.8.0_101/bin
(only part of PATH shown)

Any help would be appreciated.

Cheers.

Bug unified diff format does not work

I tried unified diff-format but the flag is not interpreted correctly. I also put the flag at the end, but this just gives the default format

$ excel_cmp --diff-format=unified all_stimuli.xlsx all_stimuli_cleaned.xlsx
File: --diff-format=unified does not exist.

Could Not Load

Getting this error when trying to compare files:
'Error: Could not find or load main class com.ka.spreadsheet.diff.SpreadSheetDiffer'

Downloaded the latest version just a few minutes ago.. I have this script on my home desktop and it works perfectly.

Diff failed: Cannot get a text value from a numeric formula cell

Hi I am attempting to compare two instances of my accounts spreadsheets, one in xls and the other in xlsx format. After a couple of seconds the program exited with the message "Diff failed: Cannot get a text value from a numeric formula cell". Thbe sheet does contain formula cells whose result is text. Any ideas?

Thanks,

Chris

Diff failed: Failed to read as excel file:

As reported in a comment to this answer, ExcelCompare is failing on a regular xls file with the following error message:

Diff failed: Failed to read as excel file: problema.xls

The file opens regularly in LibreOffice, and is parsed fine by the xlrd python library.

Unfortunately this is not a file I can post publicly, and if I open it (with LibreOffice) and "Save as" a new file the problem already disappears (so I cannot censor sensitive parts). I have no access to the computer where the file was produced.

Feel free to let me know if there is something I can do to debug the issue. This is what the command "file" says of this file:

problema.xls: Composite Document File V2 Document, Little Endian, Os: Windows, Version 4.10, Code page: 1252, Last Saved Time/Date: Thu Jan  8 19:06:04 2015

(and this is what it says after resaving with LibreOffice:

problema2.xls: Composite Document File V2 Document, Little Endian, Os: Windows, Version 1.0, Code page: -535, Revision Number: 0, Last Saved Time/Date: Thu Jan  8 19:06:04 2015

)

Out of memory error

I have two sheets that are pretty big. Around 6 MB each. When comparing them, this is what happens:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.ArrayList.<init>(ArrayList.java:132)
    at java.util.ArrayList.<init>(ArrayList.java:139)
    at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:78)
    at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:75)
    at org.apache.xmlbeans.impl.values.NamespaceContext.getNamespaceContextStack(NamespaceContext.java:98)
    at org.apache.xmlbeans.impl.values.NamespaceContext.push(NamespaceContext.java:106)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.check_dated(XmlObjectBase.java:1273)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.stringValue(XmlObjectBase.java:1484)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.getStringValue(XmlObjectBase.java:1492)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTCellImpl.getR(Unknown Source)
    at org.apache.poi.xssf.usermodel.XSSFCell.<init>(XSSFCell.java:105)
    at org.apache.poi.xssf.usermodel.XSSFRow.<init>(XSSFRow.java:70)
    at org.apache.poi.xssf.usermodel.XSSFSheet.initRows(XSSFSheet.java:179)
    at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:143)
    at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:130)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:286)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:186)
    at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:91)
    at com.ka.spreadsheet.diff.SpreadSheetDiffer.loadSpreadSheet(SpreadSheetDiffer.java:162)
    at com.ka.spreadsheet.diff.SpreadSheetDiffer.doDiff(SpreadSheetDiffer.java:81)
    at com.ka.spreadsheet.diff.SpreadSheetDiffer.doDiff(SpreadSheetDiffer.java:61)
    at com.ka.spreadsheet.diff.SpreadSheetDiffer.main(SpreadSheetDiffer.java:54)

Update README with --diff_numeric_precision

It would be nice if the tool could skip reporting very small numeric differences (eg reporting 0.000001 vs 0.000002)

You could optionally specify a threshold on the command line that if specified would only report numeric differences greater than the threshold. (eg specifing "--numeric-threshold 0.001") would prevent the example above from being reported.

Great Tool.
Thanks

Ignore custom / unknown functions

I'm trying to diff two spreadsheets that were created with google spreadsheets. In them I use a number of google docs specific functions such as REGEXMATCH which fails the diff with Diff failed: Name 'REGEXMATCH' is completely unknown in the current workbook

  • it would be great to specify a list of functions to ignore or maybe just a flag to ignore unknown Names.

Prevent window from closing.

I am trying to tie this in with SmartGit.
I have it sending both the two files I want to diff.
Due to the lack of a gui and no place for the output to be displayed I went into the excel_cmp.bat and changed it to

start java -ea ...... etc

Now a window opens and I can see after a second or two the output comes to the window... but then it closes instantly and I cannot view anything.

Do you know if there is a way to prevent the closer of the window?

.2 version failure

I receive ...

Exception in thread "main" java.lang.UnsupportedClassVersionError: com/ka/excelcmp/ExcelCompare : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

...when using v0.2.

Output looks good using v0.1

I work on a 2012 MacBook Air running OS X Version 10.8.2.

Other info:

java -version
java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06-434-11M3909)
Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01-434, mixed mode)

`fatal: external diff died, stopping at` error when using with git

The fatal: external diff died, stopping at Project_A.xls appears when using excel_cmp attribute for git diff. This seems like an expected behavior, perhaps a more graceful exiting could be used. Any ideas?

The setup is similar to this:
~/.gitconfig

[diff "xls"]
binary = true
command = ~/bin/ExcelCompare/bin/excel_cmp

~/.config/git/attributes

*.xls diff=xls

Doesn't seem to work

Hello,

I followed the download and use instructions (I'm using Linux) and the results are below. I don't think the command given in the instructions '$ excel_cmp ' is correct. If it's a shell script it surely needs to be './excel_cmp'. Disappointing that it doesn't work. Any help appreciated.

I had changed directory into the 'scripts' directory (ExcelCompare-master/scripts) for the examples below.

tcd@bramber:/$ java -version
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.6) (6b27-1.12.6-1~deb6u1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

tcd@bramber:/$ excel_cmp lbs_membership_database.xls lbs_membership_database_new.xls
-bash: excel_cmp: command not found

tcd@bramber:/$ ./excel_cmp lbs_membership_database.xls lbs_membership_database_new.xls
Exception in thread "main" java.lang.NoClassDefFoundError: com/ka/excelcmp/ExcelCompare
Caused by: java.lang.ClassNotFoundException: com.ka.excelcmp.ExcelCompare
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: com.ka.excelcmp.ExcelCompare. Program will exit.
tcd@bramber:/$

Help with git diff?

Hi, this tool seems very useful and I'd like to try using it with my git-managed spreadsheets.

#30 gives some example config. I have the gitattributes working, but in the gitconfig, command doesn't seem to do anything - I still get the "binary files differ" message. (Using textconv works as expected.) I couldn't find anything in the git documentation about this feature. Any ideas what to try? I'm using git 2.13.6 on Fedora.

Also:
http://programmaticallyspeaking.com/git-diffing-excel-files.html
https://stackoverflow.com/questions/20069976/setup-git-diff-for-special-file-types

I haven't got this far yet, but the first link claims we need a wrapper script to pass arguments 2 and 5 to ExcelCompare, while the other sources don't do this. Which way is it? Maybe it's because that was an older version of ExcelCompare?

Comparing worksheets with the same name

Currently, between two Excel files it just compares the sheets within in sequential order. Instead it should compare the sheets with the same name in the two files.

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceede d

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceede
d
at java.util.regex.Matcher.(Unknown Source)
at java.util.regex.Pattern.matcher(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFRichTextString.utfDecode(XSSFRichTe
xtString.java:481)
at org.apache.poi.xssf.usermodel.XSSFRichTextString.getString(XSSFRichTe
xtString.java:297)
at org.apache.poi.xssf.usermodel.XSSFCell.getStringCellValue(XSSFCell.ja
va:247)
at com.ka.spreadsheet.diff.CellExcel.getStringValue(SpreadSheetExcel.jav
a:152)
at com.ka.spreadsheet.diff.CellPos.getStringValue(CellPos.java:34)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.doDiff(SpreadSheetDiffer.ja
va:99)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.doDiff(SpreadSheetDiffer.ja
va:61)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.main(SpreadSheetDiffer.java
:54)

BTW...Each file was 1.74MB in size. Only one cell was different in the row 5 for me to test it. It provided the output difference and then my computer's fan kicked in right before getting this error. Computer has 32GB of RAM.

Doesn't compare macros

Steps to Reproduce:

  1. Create/save a empty xlsx file.
  2. Add a macro to the excel file and save it. (It will require the extension to change to xlsm)
  3. Use ExcelCompare to diff the 2 files.

Results:
ExcelCompare says the files match.

Expected Result:
ExcelCompare should say that they are different. Ideally it would even say the name of the new macro was added.

How to run execel comparison?

Hello,

  1. Can you add more details where to put excel files and from where we can execute them? where we can see shell script ?

2.when i run ,it is throwing error like file 2 not found

what is th exact process i need to follow?

Hang on big .ods files.

When I try two almost empty files I get a nice diff, but when I try a real-world example, it hangs.
I can provide examples, if wanted.

Include support for /dev/null (linux) and \\.\NUL (windows)

Currently, using git diff HEAD~1 on a newly committed excel file produces the error File: /dev/null is not a file. on linux. This is because git requests the diff command to compare between the new file and /dev/null (the "empty" file). Ideally ExcelCompare would treat /dev/null as a real, empty excel file, and show all additions made.

Relevant git configuration:

~/.gitconfig

[diff "xls"]
    binary = true
    command = ~/bin/ExcelCompare/bin/excel_cmp

~/.config/git/attributes

*.xls diff=xls

Error: Could not find or load main class com.ka.spreadsheet.diff.SpreadSheetDiffer

Getting this error when attempting to run the script:
Error: Could not find or load main class com.ka.spreadsheet.diff.SpreadSheetDiffer

Ran it as sudo and got the following stack trace:

Exception in thread "main" java.lang.NoClassDefFoundError: com/ka/spreadsheet/diff/SpreadSheetDiffer
Caused by: java.lang.ClassNotFoundException: com.ka.spreadsheet.diff.SpreadSheetDiffer
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)

The command I'm using is pretty simple:

$ excel_cmp file_one.xls file_two.xls

ran it within the path where the java file in the error is held and got the same results

FeatureRequest: compare numbers with a given precision

Hi,
this is a great tool, and I would like to use it against some computed files.
Sadly, I get thousand of false errors, like:

DIFF Cell at MyTab!M79 => '979.1358516266992' v/s '979.1358516266946'

Would it be possible to put a threshold when comparing numbers?

Thanks!
Luigi

Flag --diff_numeric_precision not working

Hi,
I am not able to use the flag --diff_numeric_precision (using release 0.5.1):

..\ExcelCompare\excel_cmp.bat --diff_numeric_precision=0.1 a.xls b.xls
File: --diff_numeric_precision=0.01 does not exist.

..\ExcelCompare\excel_cmp.bat a.xls b.xls --diff_numeric_precision=0.1
I get the old diff with many "equal" values, like:
DIFF Cell at Planned Fee - All Events!J77 => '56370.68544061038' v/s '56370.685440610396'
DIFF Cell at Planned Fee - All Events!J78 => '56370.68544061038' v/s '56370.685440610396'
DIFF Cell at Planned Fee - All Events!D79 => '76523.98000000001' v/s '76523.98'
DIFF Cell at Planned Fee - All Events!G79 => '76523.98000000001' v/s '76523.98'

thanks
Luigi

Development ideas

Hi,

I work building (big) Excel financial models and source control is a nightmare. I love git (for small personal coding projects) but it doesn't work with Excel, as we all know. I came across your project yesterday after following a link from http://programmaticallyspeaking.com/git-diffing-excel-files.html and it's a bit of a revelation for me.

I've been giving some thought to how I'd approach making your tool more functional for the types of spreadsheet I work with and below are my high level thoughts on how the tool could be structured. The key idea being to track transformations first and then cell diff (hopefully) aligned cells. I know you have this on your to do list and to my way of thinking tracking cell transformations requires quite a different program structure, which as well as being difficult is possibly why you've not got round to it yet.

I know nothing about java and am not a trained programmer but am happy to try to help out (day-job workload permitting) if you're keen. Alternately if the changes are too radical/not where you want to go I could just fork you and go my own way. Let me know.

Anyway here's my thoughts:

Maintain 3 spreadsheets

  • From spreadsheet (immutable)
  • To spreadsheet (immutable)
  • Working spreadsheet, a clone of the From spreadsheet to start with but is gradually edited by applying the changes identified in to two transform sections below

FIND TRANSFORMATIONS
idea is to repress high-volume cell difference reporting that can be easily explained by the single transformation that created the bulk changes

SHEET TRANSFORMATIONS

  • (REPORT) Sheets re-ordered
  • (REPORT) Sheet renamed
  • (REPORT) Sheet copied
  • (REPORT) Sheet deleted

CELL TRANSFORMATIONS

  • (REPORT) Rows added
  • (REPORT) Rows deleted
  • (REPORT) Rows cut & pasted (i.e. moved)
  • (REPORT) Rows copied & inserted (i.e. duplicated)
  • (REPORT) Columns added
  • (REPORT) Columns deleted
  • (REPORT) Columns cut & pasted (i.e. moved)
  • (REPORT) Columns copied & inserted (i.e. duplicated)
  • (REPORT) Cells deleted (shift left)
  • (REPORT) Cells deleted (shift up)
  • (REPORT) Cells inserted (shift left)
  • (REPORT) Cells inserted (shift down)
  • Possibly need to run this loop 2 (or more) times per sheet to catch instances where, for example, columns have been inserted and then rows

FIND CELL DIFFERENCES
Can now compare the cells in the Working spreadsheet to the To spreadsheet

  • Need a method to translate the cell address through the transformations to infer the original cell reference in the From spreadsheet (i.e. reverse apply the transformations) for reporting

FORMULAE CONDENSE

  • Condense the sheet down to just unique formulae on the sheet and the blocks of cells that all share the same formula
  • This is important as in big Excel files it is common practice to fill formulae. For example a spreadsheet might have 500 columns and each row will have a filled-across formula. If the diff reports on each of the 500 differences (per edited row) then you'll spam the output and not be able to easily determine what's changed
  • In VBA you'd do this by comparing the FormulaR1C1 property of the cells for equivalence
  • Output should be a paired list of (block) cell references and the formula in the top, left cell of the block

CELL DIFFERENCES

  • (REPORT) Where cell blocks with matching formulae have been extended or reduced
  • (REPORT) Where matching cell blocks have different formulae
  • (REPORT) Where non-matching cell blocks with different formulae (need to think about this one a bit)

Comparing for large number of rows not happening

I am trying to compare 2 excel files with 11680 rows and 17 Columns which never happens and after some time it shows OutOfMemoryException.
Please find the stack trace below

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTRowImpl.ge
tCArray(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFRow.(XSSFRow.java:67)
at org.apache.poi.xssf.usermodel.XSSFSheet.initRows(XSSFSheet.java:238)
at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:201)
at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java
:188)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkboo
k.java:378)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:166)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.(XSSFWorkbook.java:2
41)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.ja
va:211)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.ja
va:188)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.loadSpreadSheet(SpreadSheet
Differ.java:241)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.doDiff(SpreadSheetDiffer.ja
va:133)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.doDiff(SpreadSheetDiffer.ja
va:102)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.main(SpreadSheetDiffer.java
:90)

**

Create formal LICENSE file

This is a great project!

Is it possible this repo could have a LICENSE file with the contents of the MIT license? It is more formal and official than a link into the MIT license template.

Exception in thread "main" java.lang.NoClassDefFoundError: com/ka/spreadsheet/diff/SpreadSheetDiffer

I downloaded this app, and installed it according to the instructions to unzip it to any folder. My path includes references to a java installation as "C:\oracle\product\10.1.0\Client_1\bin;C:\oracle\product\10.1.0\Client_1\jre\1.4.2\bin\client;C:\oracle\product\10.1.0\Client_1\jre\1.4.2\bin".

I am running Windows 7. If I issue the "java -version" command, I get:
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

I get the subject diagnostic message when I issue the command:
excel_cmp ....\lock_unlock_events.xlsx ....\lock_unlock_events.xlsx
(both files are the same, as this was just an installation verification test)

Is there any way to get the compare tool to work in my environment?

I can't use your software

my operation system is windows 10 . And i follow your advice to set,but It's not use.
I use the eclipse ee to run your program but miss jar.
so about these two problem,what should I do.

Error: "illegal reflective access operation has occurred"

I've installed version 0.6.1 on Mojave 10.14.6 using Homebrew, and I'm using it to git diff excel files.

Application excel_cmp works fine, except for a warning error:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.poi.util.DocumentHelper (file:/usr/local/Cellar/excel-compare/0.6.1/libexec/poi-ooxml-3.12.jar) to method com.sun.org.apache.xerces.internal.util.SecurityManager.setEntityExpansionLimit(int)
WARNING: Please consider reporting this to the maintainers of org.apache.poi.util.DocumentHelper
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

It's not a big deal, but it makes the output that much harder to read ...

Diff failed: Error reading Cell at Sheet1!B2: Cannot get a numeric value from a text cell

Hi,

Im getting this error: "Diff failed: Error reading Cell at Sheet1!B2: Cannot get a numeric value from a text cell" 

The spreadsheets only have 2 lines in each. Any help would be appreciated. I am trying to establish why when I add coumn widths with mk-j/PHP_XLSXWriter excel needs to repair my file...

I cant post the files here, but could share them privately.

java.lang.RuntimeException: Error reading Cell at Sheet1!B2: Cannot get a numeric value from a text cell
at com.ka.spreadsheet.diff.CellPos.getCellValue(CellPos.java:42)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.doDiff(SpreadSheetDiffer.java:61)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.doDiff(SpreadSheetDiffer.java:25)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.main(SpreadSheetDiffer.java:17)
Caused by: java.lang.IllegalStateException: Cannot get a numeric value from a text cell
at org.apache.poi.xssf.usermodel.XSSFCell.typeMismatch(XSSFCell.java:888)
at org.apache.poi.xssf.usermodel.XSSFCell.getNumericCellValue(XSSFCell.java:215)
at com.ka.spreadsheet.diff.CellExcel.getValue(SpreadSheetExcel.java:178)
at com.ka.spreadsheet.diff.CellPos.getCellValue(CellPos.java:40)
... 3 more

Command not found.

Hi,

I've been careful to download Excel Compare 0.5.1, and not its binary. I've unzipped the file, put it bin in my path, but when I run it I'm getting a excel_cmp: command not found errors.

I'm running java 1.8.0_60.
Any pointers?

Comapare Sheets in same Excel

I know this is not a must thing, but was just curios to know whether we can compare 2 sheets in the same Excel File.

There should be cell format compare option

I noticed that ExcelCompare doesn't notice the difference between cells which are formatted in a different way. If one cell contains a date with time 00:00:00 and the other contains only the date, ExcelCompare thinks that they are the same.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

Try to compare 2 big Excel documents:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.(NamespaceContext.java:78)
at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.(NamespaceContext.java:75)
at org.apache.xmlbeans.impl.values.NamespaceContext.getNamespaceContextStack(NamespaceContext.java:98)
at org.apache.xmlbeans.impl.values.NamespaceContext.push(NamespaceContext.java:106)
at org.apache.xmlbeans.impl.values.XmlObjectBase.check_dated(XmlObjectBase.java:1273)
at org.apache.xmlbeans.impl.values.XmlObjectBase.stringValue(XmlObjectBase.java:1484)
at org.apache.xmlbeans.impl.values.XmlObjectBase.getStringValue(XmlObjectBase.java:1492)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTCellImpl.getR(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFCell.(XSSFCell.java:105)
at org.apache.poi.xssf.usermodel.XSSFRow.(XSSFRow.java:70)
at org.apache.poi.xssf.usermodel.XSSFSheet.initRows(XSSFSheet.java:179)
at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:143)
at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:130)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:286)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.(XSSFWorkbook.java:186)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:91)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.loadSpreadSheet(SpreadSheetDiffer.java:189)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.doDiff(SpreadSheetDiffer.java:82)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.doDiff(SpreadSheetDiffer.java:62)
at com.ka.spreadsheet.diff.SpreadSheetDiffer.main(SpreadSheetDiffer.java:55)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.