plutext / docx4j Goto Github PK

JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files

HTML 0.36% Java 98.15% GAP 0.02% XSLT 1.21% JavaScript 0.01% CSS 0.10% C# 0.14% VBScript 0.01%

docx4j's Introduction

README

What is docx4j?

docx4j is an open source (Apache v2) library for creating, editing, and saving OpenXML "packages", including docx, pptx, and xslx.

It uses JAXB to create the Java representation.

Open existing docx/pptx/xlsx
Create new docx/pptx/xlsx
Programmatically manipulate docx/pptx/xlsx (anything the file format allows)
Document generation via variable, content control data binding, or MERGEFIELD
CustomXML binding (with support for pictures, rich text, checkboxes, and OpenDoPE extensions for repeats & conditionals, and importing XHTML)
Export as HTML
Export as PDF, choice of 3 strategies, see https://www.docx4java.org/blog/2020/09/office-pptxxlsxdocx-to-pdf-to-in-docx4j-8-2-3/
Produce/consume Word 2007's xmlPackage (pkg) format
Apply transforms, including common filters
Font support (font substitution, and use of any fonts embedded in the document)

docx4j for JAXB 3.0 and Java 11+

docx4j v11.4.5 uses Jakarta XML Binding API 3.0, as opposed to JAXB 2.x used in earlier versions (which import javax.xml.bind.*). Since this release uses jakarta.xml.bind, rather than javax.xml.bind, if you have existing code which imports javax.xml.bind, you'll need to search/replace across your code base, replacing javax.xml.bind with jakarta.xml.bind. You'll also need to replace your JAXB jars (which Maven will do for you automatically; otherwise get them from the relevant zip file).

Being a JPMS modularised release, the jars also contain module-info.class entries.

To use it, add the dep corresponding to the JAXB implementation you wish to use

docx4j-JAXB-ReferenceImpl
docx4j-JAXB-MOXy

docx4j-8

This is docx4j for Java 8. Although in principle it would compile and run under Java 6, some of its dependencies are Java 8 only. So to run it under Java 6, you'd need to use the same version of the deps which docx4j 6.x uses.

docx4j v8 is a multi-module Maven project.

To use docx4j v8, add the dep corresponding to the JAXB implementation you wish to use

docx4j-JAXB-Internal (shipped in Oracle and OpenJDK v8)
docx4j-JAXB-ReferenceImpl (you may need to respect the endorsed dir mechanism for the RI jars)
docx4j-JAXB-MOXy

You should use one and only one of docx4j-JAXB-*

How do I build docx4j?

Get it from GitHub, at https://github.com/plutext/docx4j

mvn clean
mvn install

Some of the tests might fail on Windows. For now, you could skip them: mvn install -DskipTests

For more details, see http://www.docx4java.org/blog/2015/06/docx4j-from-github-in-eclipse-5-years-on/

If you are working with the source code, please join the developer mailing list:

    [email protected]

Where do I get a binary?

http://www.docx4java.org/downloads.html

How do I get started?

See the Getting Started guide: https://github.com/plutext/docx4j/tree/master/docs

and the Cheat Sheet: http://www.docx4java.org/blog/2013/05/docx4j-in-a-single-page/

And see the sample code: https://github.com/plutext/docx4j/tree/master/src/samples

You'll probably want the Helper AddIn to generate code: http://www.docx4java.org/blog/2016/05/docx4j-helper-word-addin-new-version-v3-3-0/

Where to get help?

http://www.docx4java.org/forums or StackOverflow (use tag 'docx4j')

Please post to one or the other, not both

Legal Information

docx4j is published under the Apache License version 2.0. For the license text, please see the following files in the legals directory:

LICENSE
NOTICE Legal information on libraries used by docx4j can be found in the "legals/NOTICE" file.

docx4j's People

Contributors

Stargazers

Watchers

Forkers

lushenbo ragill dodopari ajaydeshwal manivannans dmole sledwich jsallen paulvas gaudaudinh bwoj bezda toamitkumar sgrachov fmmfonseca payden conphident4 jasonjones-wf tarekelasmi hpeng01016 chiragpatel2310 jojada gltianwen daimos rhorenko kairatbmstu rashed89 ebr87 ryzm warmherz velus28 joongho emotl sindref ssamsun tstirrat sike1406 smitchell141 docverter migrain rayapajohesh jeffbeard lwilkinson nyer ahajri osnard yuktgsl thisistian binxue388 jdallapi yildirim06 15jkeee haresh14 matthewvcarey vansuca re6exp staiger pnml franckkm rajeshgp82 choeflake vandar2659 onasusweb ermanishks gengzhengtao kabore leopoldchen bwolff bailuizzie thrawn24 nairobi77 hasan40 nanofish senthilsendhil nshivani aarya23 msticker mr-mig ragunath74 popo112 akashm perkinss brajeshpa07 fachhoch jspark67 guilhermesouzasantos misterjojo tonyliu7183 jerryorr rafaelccruz baylife wdmchaft pocketzwt bigtiger02 encodata guohuichen nidhinpkn marcowitteveen ibeen boy7302001

docx4j's Issues

docx4j 3.0 alpha slf4j

I read that docx4j will use slf4j. Does the latest nightly build support this yet? I downloaded it and it's still complaining about log4j NoSuchMethod exception...

Thank you!

field complexification drops bookmarks

Sample input:

    <w:p >
        <w:fldSimple w:instr=" MERGEFIELD  num  \* MERGEFORMAT ">
            <w:r>
                <w:rPr>
                    <w:noProof/>
                </w:rPr>
                <w:t>«</w:t>
            </w:r>
            <w:bookmarkStart w:name="num" w:id="0"/>
            <w:bookmarkStart w:name="_GoBack" w:id="1"/>
            <w:r>
                <w:rPr>
                    <w:noProof/>
                </w:rPr>
                <w:t>num</w:t>
            </w:r>
            <w:bookmarkEnd w:id="0"/>
            <w:bookmarkEnd w:id="1"/>
            <w:r>
                <w:rPr>
                    <w:noProof/>
                </w:rPr>
                <w:t>»</w:t>
            </w:r>
        </w:fldSimple>
    </w:p>

Carriage returns in XML data for mapped Content Controls not working

I posted the issue on the forum here:
http://www.docx4java.org/forums/docx-java-f6/carriage-return-in-content-controls-not-working-t1673.html

SLF4J support

It would be great to change the logging system from log4j to slf4j for the higher compatibility.

DOMResult can not be this kind of node

paging in pdf

Using the paging in the word generates labels page 1 from 3, 2 from 3, 3 from 3. But on the pdf it isn't working, It's just 1 of 1, 2 of 2, 3 of 3.

regenerate org.pptx4j.pml using 2ed XSD

original slide1.xml contains:

   <p:controls>
      <mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006">
        <mc:Choice xmlns:v="urn:schemas-microsoft-com:vml" Requires="v">
          <p:control spid="1026" name="ShockwaveFlash1" r:id="rId2" imgW="7561440" imgH="5759280"/>
        </mc:Choice>
        <mc:Fallback>
          <p:control name="ShockwaveFlash1" r:id="rId2" imgW="7561440" imgH="5759280">
            <p:pic>
              <p:nvPicPr>
                <p:cNvPr id="0" name="ShockwaveFlash1"/>
                <p:cNvPicPr preferRelativeResize="0">
                  <a:picLocks noChangeArrowheads="1" noChangeShapeType="1"/>
                </p:cNvPicPr>

docx4j complains:

WARN org.docx4j.utils.XSLTUtils .logWarn line 16 - Found some mc:AlternateContent
WARN org.docx4j.utils.XSLTUtils .logWarn line 16 - Selecting p:control
WARN org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 90 - [ERROR] : unexpected element (uri:"http://schemas.openxmlformats.org/presentationml/2006/main", local:"pic"). Expected elements ar
INFO org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 106 - continuing (with possible element/attribute loss)

output slide1.xml:

    <p:controls>
      <p:control imgH="5759280" imgW="7561440" r:id="rId2" name="ShockwaveFlash1"/>
    </p:controls>

org.pptx4j.pml.CTControl does not allow for p:pic content, as per the following in pml-embedding.xsd:

    <xsd:complexType name="CT_Control">
        <xsd:sequence>
            <xsd:element name="extLst" type="CT_ExtensionList"
                minOccurs="0" maxOccurs="1" />
        </xsd:sequence>
        <xsd:attributeGroup ref="AG_Ole" />
    </xsd:complexType>

This is from ECMA-376, first edition (as opposed to 2ed, which was not then available).

In the second edition, the model changed (pml.xsd) to:

    <xsd:complexType name="CT_Control">
        <xsd:sequence>
            <xsd:element name="extLst" type="CT_ExtensionList" minOccurs="0" maxOccurs="1"/>
            <xsd:element name="pic" type="CT_Picture" minOccurs="0" maxOccurs="1"/>
        </xsd:sequence>
        <xsd:attributeGroup ref="AG_Ole"/>
    </xsd:complexType>

So to fix we need to regenerate the pml classes using the second edition pml.xsd.

IllegalArgumentException when loading a .dotx file

Hi, I'm having a IllegalArgumentException when I try to load a .dotx file (word model).
Do you have it too? Will try to investigate...

ERROR Context.java:102 () Cannot initialize context
java.lang.IllegalArgumentException
fatal: Not a git repository (or any of the parent directories): .git
at com.sun.xml.internal.bind.v2.model.nav.TypeVisitor.visit(TypeVisitor.java:54)
at com.sun.xml.internal.bind.v2.model.nav.ReflectionNavigator.erasure(ReflectionNavigator.java:350)
at com.sun.xml.internal.bind.v2.model.nav.ReflectionNavigator.asDecl(ReflectionNavigator.java:298)
at com.sun.xml.internal.bind.v2.model.nav.ReflectionNavigator.asDecl(ReflectionNavigator.java:47)
at com.sun.xml.internal.bind.v2.model.impl.ModelBuilder.getTypeInfo(ModelBuilder.java:313)
at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:430)
at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:277)
at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1100)
at com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:143)
at com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:110)
at com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:128)
at javax.xml.bind.ContextFinder.find(ContextFinder.java:290)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:372)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:337)
at org.docx4j.jaxb.Context.(Context.java:97)
at org.docx4j.openpackaging.contenttype.ContentTypeManager.parseContentTypesFile(ContentTypeManager.java:658)
at org.docx4j.openpackaging.io.LoadFromZipNG.process(LoadFromZipNG.java:206)
at org.docx4j.openpackaging.io.LoadFromZipNG.get(LoadFromZipNG.java:193)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:301)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:245)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:195)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:178)
at org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(WordprocessingMLPackage.java:172)

Incorrectly interpreting some [Content Types].xml configurations as Glox packages.

I have a number of DOCX files with a [Content Types].xml file as follows:

<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
  <Default Extension="xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml" /> 
  <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml" /> 
  <Default Extension="bin" ContentType="image/jpeg" /> 
  <Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml" /> 
  <Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml" /> 
  <Override PartName="/docProps/custom.xml" ContentType="application/vnd.openxmlformats-officedocument.custom-properties+xml" /> 
  <Override PartName="/word/settings.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml" /> 
  <Override PartName="/word/footnotes.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footnotes+xml" /> 
  <Override PartName="/word/endnotes.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.endnotes+xml" /> 
  <Override PartName="/word/webSettings.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.webSettings+xml" /> 
  <Override PartName="/word/theme/theme.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml" /> 
  <Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml" /> 
  <Override PartName="/word/stylesWithEffects.xml" ContentType="application/vnd.ms-word.stylesWithEffects+xml" /> 
  <Override PartName="/word/numbering.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml" /> 
  <Override PartName="/word/fontTable.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml" /> 
  <Override PartName="/word/header.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.header+xml" /> 
  <Override PartName="/word/footer.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml" /> 
  <Override PartName="/word/footer2.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml" /> 
  <Override PartName="/media/image4.bin" ContentType="image/png" /> 
  <Override PartName="/graphics/data.xml" ContentType="application/vnd.openxmlformats-officedocument.drawingml.diagramData+xml" /> 
  <Override PartName="/graphics/layout.xml" ContentType="application/vnd.openxmlformats-officedocument.drawingml.diagramLayout+xml" /> 
  <Override PartName="/graphics/quickStyle.xml" ContentType="application/vnd.openxmlformats-officedocument.drawingml.diagramStyle+xml" /> 
  <Override PartName="/graphics/colors.xml" ContentType="application/vnd.openxmlformats-officedocument.drawingml.diagramColors+xml" /> 
  <Override PartName="/graphics/data2.xml" ContentType="application/vnd.openxmlformats-officedocument.drawingml.diagramData+xml" /> 
  <Override PartName="/graphics/layout2.xml" ContentType="application/vnd.openxmlformats-officedocument.drawingml.diagramLayout+xml" /> 
  <Override PartName="/graphics/quickStyle2.xml" ContentType="application/vnd.openxmlformats-officedocument.drawingml.diagramStyle+xml" /> 
  <Override PartName="/graphics/colors2.xml" ContentType="application/vnd.openxmlformats-officedocument.drawingml.diagramColors+xml" /> 
  <Override PartName="/media/image6.bin" ContentType="image/png" /> 
  <Override PartName="/media/image7.bin" ContentType="image/gif" /> 
</Types>

These are not created with POI. They are created using DocumentBuilder found in the PowerTools for Open XML (http://powertools.codeplex.com/)

As you can see there is a <Default> entry for WordprocessingML and an <Override> entry for DrawingML.

Because ContentTypeManager checks the overrides before it checks defaults, it interprets these files as a DrawingML package and tries to cast to a GloxPackage. When this happens, docx4j throws the following error:

java.lang.ClassCastException: org.glox4j.openpackaging.packages.GloxPackage
        at org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(WordprocessingMLPackage.java:172)

Re-arranging the order of code in ContentTypeManager to check <Default> entries before <Override> entries fixes this issue.

provide some util methods for working with spreadsheet coordinates

We probably could use some helper methods for working with spreadsheet coordinates.

Here's an example. I'm not sure these have been extensively tested but they seem to work.

  [text edited out so I can figure out how to do a pull request]

Images in header in HTML output

See http://www.docx4java.org/forums/docx-java-f6/docx-to-html-with-header-images-t968.html

Remove System.out.println from production code

Both master branch and 2.8.0 contains System.out.println. If a document contains a table, "Processing r" and "Processing c" is printed for each column and row.

Suggestion:
Remove unnecessary println statements from org.docx4j.model.table.TableModel#handleRow()

alt-chunk is missing from Paragraph's getContent()

I've a generated docx file with this kind of markup:

<w:p>
     <w:r><w:commentReference w:id="0"/></w:r>
     <w:altChunk r:id="rId16"/>
</w:p>

The problem is that when I traverse down this markup the P element's content (=children) collection does not contain the alt-chunk, so that node is not visited. I wonder if this is a valid Word markup, but Word accepts it nicely.

Can you correct this, please?
Thanks,
Zoltan

add iterators for navigating xlsx rows and columns

I've noticed that for data access, I tend to need to iterate over the data rows and columns in a particular worksheet. While there are many ways to do this, it would be nice to have smart iterators that can iterate in row and column order starting with the lowest index. Since the data in the xml is not guaranteed to be in order, its very expensive to find the "next" index but that's the nature of the format.

Here's some example code. It would be nice if this could be included in the domain model in some way to at least provide a default method for sequenced iteration. The iteration next() can return null indicating a "hole" in the spreadsheet.

I just typed in the code below, don't know if it would work and it would need some additional methods on SheetData to work correctly. It would need a SheetData.find() and SheetData.getMaxIndex() method.

[code edited out until I can figure how to do a pull request]

.tif (as opposed to .tiff)

See http://www.docx4java.org/forums/docx-java-f6/can-not-save-while-the-header-contain-tiff-image-t1004.html

Problem with vertically merged cells

I have a docx document that contains a cell that takes 3 rows. First time I traverse (via TraversalUtil) to the cell, Tc.getTcPr().getVMerge().getVal() returns correctly "restart". As far as I understand http://msdn.microsoft.com/en-us/library/ff951689(v=office.14).aspx#vert_Generating the next time the value should be "continue" but is null.

I use 2.8.1 and had the same problem with 2.9-SNAPSHOT.

samples.DocProps Fails

CP=$(LS -1|perl -p -e 's/\n/:/g'|perl -p -e 's/:$//g');

echo $CP
Docx4j_GettingStarted.docx:Docx4j_GettingStarted.html:Docx4j_GettingStarted.pdf:antlr-2.7.7.jar:antlr-runtime-3.3.jar:avalon-framework-api-4.3.1.jar:avalon-framework-impl-4.3.1.jar:commons-codec-1.3.jar:commons-io-1.3.1.jar:commons-lang-2.4.jar:commons-logging-1.1.1.jar:docx4j-2.8.0.jar:fop-1.0.jar:itext-2.1.7.jar:jaxb-svg11-1.0.2.jar:jaxb-xmldsig-core-1.0.0.jar:jaxb-xslfo-1.0.1.jar:log4j-1.2.15.jar:poi-3.8.jar:poi-scratchpad-3.8.jar:serializer-2.7.1.jar:stringtemplate-3.2.1.jar:wmf2svg-0.9.0.jar:xalan-2.7.1.jar:xhtmlrenderer-1.0.0.jar:xml-apis-1.3.04.jar:xmlgraphics-commons-1.4.jar

java -cp $CP org.docx4j.samples.DocProps ../test.docx
INFO org.docx4j.utils.Log4jConfigurator .configure line 45 - Since your log4j configuration (if any) was not found, docx4j has configured log4j automatically.
Apple Inc.
1.6.0_31
WARN org.docx4j.XmlUtils . line 128 - Using default SAXParserFactory: null
INFO org.docx4j.jaxb.NamespacePrefixMapperUtils .getPrefixMapper line 55 - Using NamespacePrefixMapperSunInternal, which is suitable for Java 6
INFO org.docx4j.jaxb.Context . line 59 - Using Java 6/7 JAXB implementation
INFO org.docx4j.jaxb.Context . line 76 - loading Context jc
INFO org.docx4j.jaxb.Context . line 84 - loaded com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl .. loading others ..
INFO org.docx4j.jaxb.Context . line 99 - .. others loaded ..
INFO org.docx4j.openpackaging.contenttype.ContentTypeManager .createPackage line 804 - Detected WordProcessingML package
INFO org.docx4j.openpackaging.parts.Part . line 150 - /_rels/.rels
INFO org.docx4j.openpackaging.parts.relationships.RelationshipsPart .unmarshal line 861 - unmarshalling org.docx4j.openpackaging.parts.relationships.RelationshipsPart
INFO org.docx4j.openpackaging.parts.Part . line 150 - /docProps/core.xml
INFO org.docx4j.openpackaging.parts.DocPropsCorePart .unmarshal line 122 - unmarshalling org.docx4j.openpackaging.parts.DocPropsCorePart
INFO org.docx4j.openpackaging.parts.Part . line 150 - /docProps/app.xml
INFO org.docx4j.openpackaging.parts.DocPropsExtendedPart .unmarshal line 128 - unmarshalling org.docx4j.openpackaging.parts.DocPropsExtendedPart
INFO org.docx4j.openpackaging.parts.Part . line 150 - /word/document.xml
INFO org.docx4j.openpackaging.parts.JaxbXmlPart .unmarshal line 156 - For org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart, unmarshall via binder
INFO org.docx4j.openpackaging.parts.Part . line 150 - /word/_rels/document.xml.rels
INFO org.docx4j.openpackaging.parts.relationships.RelationshipsPart .unmarshal line 861 - unmarshalling org.docx4j.openpackaging.parts.relationships.RelationshipsPart
WARN org.docx4j.openpackaging.contenttype.ContentTypeManager .newPartForContentType line 433 - DefaultPart used for part '/word/stylesWithEffects.xml' of content type 'application/vnd.ms-word.stylesWithEffects+xml'
WARN org.docx4j.openpackaging.parts.Part . line 104 - Couldn't set javax.xml.parsers.DocumentBuilderFactory: org.apache.xerces.jaxp.DocumentBuilderFactoryImpl
INFO org.docx4j.openpackaging.parts.Part . line 110 - Using javax.xml.parsers.DocumentBuilderFactory: com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
INFO org.docx4j.openpackaging.parts.Part . line 150 - /word/stylesWithEffects.xml
INFO org.docx4j.openpackaging.parts.Part . line 150 - /word/settings.xml
WARN org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 90 - [ERROR] : unexpected element (uri:"http://schemas.microsoft.com/office/word/2010/wordml", local:"docId"). Expected elements are <{
INFO org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 106 - continuing (with possible element/attribute loss)
INFO org.docx4j.openpackaging.parts.JaxbXmlPart .unmarshal line 243 - encountered unexpected content; pre-processing
WARN org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 90 - [ERROR] : unexpected element (uri:"http://schemas.microsoft.com/office/word/2010/wordml", local:"docId"). Expected elements are <{
INFO org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 106 - continuing (with possible element/attribute loss)
WARN org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 90 - [ERROR] : unexpected element (uri:"http://schemas.microsoft.com/office/word/2010/wordml", local:"defaultImageDpi"). Expected eleme
INFO org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 106 - continuing (with possible element/attribute loss)
INFO org.docx4j.openpackaging.parts.Part . line 150 - /word/webSettings.xml
INFO org.docx4j.openpackaging.contenttype.ContentTypeManager .getPart line 264 - Looking at extension 'png
INFO org.docx4j.openpackaging.contenttype.ContentTypeManager .getPart line 268 - Found content type 'image/png' for /word/media/image1.png
INFO org.docx4j.openpackaging.parts.Part . line 150 - /word/media/image1.png
INFO org.docx4j.openpackaging.parts.Part .setBinaryData line 82 - .. closed.
INFO org.docx4j.openpackaging.parts.Part . line 150 - /word/fontTable.xml
INFO org.docx4j.openpackaging.parts.Part . line 150 - /word/theme/theme1.xml
INFO org.docx4j.openpackaging.parts.ThemePart .unmarshal line 103 - unmarshalling org.docx4j.openpackaging.parts.ThemePart
INFO org.docx4j.openpackaging.parts.Part . line 150 - /word/numbering.xml
INFO org.docx4j.openpackaging.parts.Part . line 150 - /word/styles.xml
INFO org.docx4j.openpackaging.contenttype.ContentTypeManager .getPart line 264 - Looking at extension 'jpeg
INFO org.docx4j.openpackaging.contenttype.ContentTypeManager .getPart line 268 - Found content type 'image/jpeg' for /docProps/thumbnail.jpeg
INFO org.docx4j.openpackaging.parts.Part . line 150 - /docProps/thumbnail.jpeg
INFO org.docx4j.openpackaging.parts.Part .setBinaryData line 82 - .. closed.
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.docx4j.samples.DocProps.main(DocProps.java:65)

Excel splutter on true/false attribute values for tableStyleInfo

From http://www.docx4java.org/forums/pptx-java-f14/pptx-2007-updating-chart-spreadsheet-t1307.html

docx4j writes:

Excel wants:

In sml-table.xsd, the attributes are type="xsd:boolean"

Now that should allow true/false or 1 or 0 (ie what docx4j is doing is legal), but the fact is that Excel doesn't like it.

Tables tblLook: Support ECMA 376 2ed definition.

See http://www.docx4java.org/forums/docx-java-f6/how-to-set-banded-columns-rows-using-docx4j-t1494.html

Incorrect processing of [Content_Types].xml with Default tags

If you replace the Override content type tag for part word/document.xml with a Default content type tag docx4j will fail to load the file, complaining that it can only handle docx files. This behavior of using a Default tag instead of a specific Override tag is how Apache POI outputs docx files, so it is impossible to chain POI to docx4j right now.

Replace:

<Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>

with:

<Default Extension="xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>

Exception:

org.docx4j.openpackaging.exceptions.InvalidFormatException: Unexpected package (docx4j supports docx/docxm and pptx only
    at org.docx4j.openpackaging.contenttype.ContentTypeManager.createPackage(ContentTypeManager.java:834)
    at org.docx4j.openpackaging.io.LoadFromZipNG.process(LoadFromZipNG.java:213)
    at org.docx4j.openpackaging.io.LoadFromZipNG.get(LoadFromZipNG.java:193)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:301)

Cell alignment in table with merged cells

Cells have different alignment in the pdf, it just resizes them to be aligned to the merged cells on the left/right.
This is how it looks in word:

And this is how it looks in pdf:

As you can see, the right bottom cell which is simple cell is aligned to the cell on th left which is cell merged from two cells.

StdPr.setDataBinding() not removing previous binding

In the code snippet from SdtPr below, the remove call is attempting to remove an object that has been unwrapped, but the list contains a wrapped object. Thus using setDataBinding(null) does nothing.

public void setDataBinding(CTDataBinding value) {

    CTDataBinding existingBinding = getDataBinding(); 

    if (existingBinding!=null) {
        if (!existingBinding.equals(value)) {
            log.debug("Changing DataBinding tag from " + existingBinding + " to " + value);
            rPrOrAliasOrLock.remove(existingBinding);
            if (value!=null) {
                rPrOrAliasOrLock.add(value);
            }
        }

A fix could be to replace

            rPrOrAliasOrLock.remove(existingBinding);

with

            for (Object o : rPrOrAliasOrLock) {
                if ( XmlUtils.unwrap(o) == existingBinding ) {
                    rPrOrAliasOrLock.remove(o);
                } 
            }

Problem with HTML entities

I receive this Exception when I have some kind of html code Ã or É

this my exception:
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): weblogic.xml.jaxp.RegistryXMLReader
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): weblogic.xml.jaxp.RegistryXMLReader
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): weblogic.xml.jaxp.RegistryXMLReader
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): weblogic.xml.jaxp.RegistryXMLReader
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): weblogic.xml.jaxp.RegistryXMLReader
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): weblogic.xml.jaxp.RegistryXMLReader
org.docx4j.org.xhtmlrenderer.load INFO:: The entity "atilde" was referenced, but not declared.
ERROR: 'The entity "atilde" was referenced, but not declared.'
org.docx4j.org.xhtmlrenderer.exception WARNING:: Unhandled exception. Can't load the XML resource (using TRaX transformer). org.xml.sax.SAXParseException: The entity "atilde" was referenced, but not declared.
13:27:30,384 DEBUG [SiszarcBean] issues at Line 3, Col 13
13:27:30,390 ERROR [BaseInterceptor] br.gov.mapa.arquitetura.exception.ApplicationException: org.docx4j.openpackaging.exceptions.Docx4JException: issues at Line 3, Col 13

Do you know some soluction

&bull;

• is getting converted to •

(AbstractHtmlExporter.java:182)

conflicting with

(XMLEscapeUTF8.java:183)
(XMLEscapeWriterUTF8.java:154)
(XMLEscapeASCII.java:159)
(XMLEscapeWriterASCII.java:154)

...

VML support in PDF/HTML output

For example, v:textbox/w:txbxContent would be a good place to start.

https://github.com/plutext/docx4j/blob/master/src/test/resources/vml/textbox.docx

HTML/XSL FO (PDF): handle run fonts correctly.

If a run contains anything other than characters in the ASCII range (character values 0–127), we need to split it up into:

ASCII range (format using @Ascii font
Unicode sub ranges for East Asian languages (format using @eastasia font)
Unicode sub ranges for complex script languages (format using @cs font)
other (format using @hansi font)

Remove System.out when calling OpenDopeHandler.preProcess()

stacktrace gets generated in System.out , when i call OpenDopeHandler.preProcess() statement

When i had a close look , in there docx4j library source level , the class (we are using docx4j 2.8.1) http://grepcode.com/file/[email protected][email protected]@org$docx4j$model$datastorage$OpenDoPEHandler.java
contains the System.out.println("newPath: " + newPath);

but the same time just below version, in 2.8.0 , the above statement is commented . http://grepcode.com/file/repo1.maven.org/maven2/org.docx4j/docx4j/2.8.0/org/docx4j/model/datastorage/OpenDoPEHandler.java

Can you please take care of this in the next fix ?

OpenDoPE repeats: cross ref - bookmarks

Where the same repeat is used twice, once for summary and again for detail, it would be good if each instance of summary could cross ref to the corresponding instance of detail.

Suspect an enhancement may be required so that the bookmark in each instance of detail is given a new name/ID, and used appropriately from summary.

PageBreak class: w:p in content control

Unless these are processed, the page break won't appear in PDF output.

See http://www.docx4java.org/forums/docx-java-f6/page-breaks-in-pdf-s-t1128.html

Unable to load or export certain types of styles to HTML (syles that don't set a basedOn atribute)

I was having trouble exporting a .docx document to HTML; not all of the Word styles were exporting as CSS classes. When I stepped through StyleTree's constructor and the addNode(...) method, I saw that the missing style was set as that type's (paragraph) rootElement, then later overwritten by another style. I investigate further; my style was being overwritten because it doesn't specify a basedOn value. The problem doesn't have anything to do with the conversion to CSS; the style isn't being properly loaded by StyleTree.

The StyleTree class is currently written in a way that assumes all of the styles for a given style type (table, paragraph, character) will have a single root style from which they inherit. Any <w:style> lacking a w:basedOn gets set as the root node for that type - i.e. the style that forms the base that child styles inherit/override. (In other words, the one that they or their parents are all basedOn).

It is possible to generate a .docx file using Word 2007 that has more than one style that would be considered the root for a type. If a user goes to Modify Style, they can select Style based on: (no style) or (underlying properties). (Note: this option is not available for the Normal style.) The user may create a document with an arbitrary number of these non-inheriting styles.

Given the above information, it's possible that a fix might be to modify TreeStyle to use multiple Trees per style type. (Since it's also possible that other styles might be based on/inheriting from these additional non-inheriting styles.)

For instance, please see the image below in which I set Emphasis' basedOn to none via the Modify Styles dialog in Word 2007.

<font size="1"> tag is not supported in docx4j ConvertinXHTML.

Hi I am very new to this API and I must say great work and I really appreciate it.
I am facing an issue with this and it is,
I have and HTML template which is converted into DOC and that is fine but I have mentioned some tags like the but in my converted DOC it displays the same size that is size 11.
Can anyone please help me like what change I need to do in my HTML template, so that I get small fonts only for specific paragraph in my converted DOC.

I would highly appreciate your prompt reply and
Thanks in advance,
Vrinda

wp:docPr and dml-wordprocessingDrawing.xsd

Looks like JAXB classes generated from this schema will expect docPr in http://schemas.openxmlformats.org/drawingml/2006/main, not http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing

See further http://www.docx4java.org/forums/topic1086.html

hyperlinks disappear in ConvertInXHTMLFile

When converting from docx to xhtml hyperlink text is duplicated

  <div class="document">

  <p class="Heading1 Normal DocDefaults "><span class="DefaultParagraphFont ">This is a test</span></p>

  <p class="Normal DocDefaults "> </p>

  <p class="Normal DocDefaults "><a href="http://example.com"><span class="Hyperlink DefaultParagraphFont ">example1</span><span class="Hyperlink DefaultParagraphFont ">example1</span></a></p>

  <p class="Normal DocDefaults "> </p>

  <p class="Normal DocDefaults "><a href="http://example.com"><span class="Hyperlink DefaultParagraphFont ">example2</span><span class="Hyperlink DefaultParagraphFont ">example2</span></a></p>

  <p class="Normal DocDefaults "> </p>

  <p class="Normal DocDefaults "><a href="http://example.com"><span class="Hyperlink DefaultParagraphFont ">example3</span><span class="Hyperlink DefaultParagraphFont ">example3</span></a></p>

  <p class="Normal DocDefaults "> </p>

  <p class="Normal DocDefaults "> </p></div>

when converting back to docx from xhtml hyperlinks disappear

<w:body>
        <w:p>
            <w:pPr>
                <w:keepNext/>
                <w:spacing w:after="0"/>
                <w:ind w:left="0"/>
                <w:jc w:val="left"/>
            </w:pPr>
            <w:r>
                <w:rPr>
                    <w:rFonts w:hAnsi="serif" w:ascii="serif"/>
                    <w:b/>
                    <w:i w:val="false"/>
                    <w:color w:val="345a8a"/>
                    <w:sz w:val="32"/>
                </w:rPr>
                <w:t>This is a test</w:t>
            </w:r>
        </w:p>
        <w:p>
            <w:pPr>
                <w:spacing w:after="0"/>
                <w:ind w:left="0"/>
                <w:jc w:val="left"/>
            </w:pPr>
            <w:r>
                <w:rPr>
                    <w:rFonts w:hAnsi="serif" w:ascii="serif"/>
                    <w:b w:val="false"/>
                    <w:i w:val="false"/>
                    <w:color w:val="000000"/>
                    <w:sz w:val="24"/>
                </w:rPr>
                <w:t> </w:t>
            </w:r>
        </w:p>
        <w:p>
            <w:pPr>
                <w:spacing w:after="0"/>
                <w:ind w:left="0"/>
                <w:jc w:val="left"/>
            </w:pPr>
            <w:r>
                <w:rPr>
                    <w:rFonts w:hAnsi="serif" w:ascii="serif"/>
                    <w:b w:val="false"/>
                    <w:i w:val="false"/>
                    <w:color w:val="000000"/>
                    <w:sz w:val="24"/>
                </w:rPr>
                <w:t> </w:t>
            </w:r>
        </w:p>
        <w:p>
            <w:pPr>
                <w:spacing w:after="0"/>
                <w:ind w:left="0"/>
                <w:jc w:val="left"/>
            </w:pPr>
            <w:r>
                <w:rPr>
                    <w:rFonts w:hAnsi="serif" w:ascii="serif"/>
                    <w:b w:val="false"/>
                    <w:i w:val="false"/>
                    <w:color w:val="000000"/>
                    <w:sz w:val="24"/>
                </w:rPr>
                <w:t> </w:t>
            </w:r>
        </w:p>
        <w:p>
            <w:pPr>
                <w:spacing w:after="0"/>
                <w:ind w:left="0"/>
                <w:jc w:val="left"/>
            </w:pPr>
            <w:r>
                <w:rPr>
                    <w:rFonts w:hAnsi="serif" w:ascii="serif"/>
                    <w:b w:val="false"/>
                    <w:i w:val="false"/>
                    <w:color w:val="000000"/>
                    <w:sz w:val="24"/>
                </w:rPr>
                <w:t> </w:t>
            </w:r>
        </w:p>
        <w:p>
            <w:pPr>
                <w:spacing w:after="0"/>
                <w:ind w:left="0"/>
                <w:jc w:val="left"/>
            </w:pPr>
            <w:r>
                <w:rPr>
                    <w:rFonts w:hAnsi="serif" w:ascii="serif"/>
                    <w:b w:val="false"/>
                    <w:i w:val="false"/>
                    <w:color w:val="000000"/>
                    <w:sz w:val="24"/>
                </w:rPr>
                <w:t> </w:t>
            </w:r>
        </w:p>
        <w:sectPr>
            <w:pgSz w:code="1" w:h="15840" w:w="12240"/>
            <w:pgMar w:gutter="" w:footer="" w:header="" w:left="1440" w:bottom="1440" w:right="1440" w:top="1440"/>
        </w:sectPr>
    </w:body>

HTML importer doesn't work for image in header

We have tried to use XHTMLImporter class to add XHTML to the header and footers.
XHTML contains some images and those are missing in the DocX. The document shows message as ' This image cannot be currently displayed'.

After debugging the code, we have realized that relations for the images are added to 'document.xml.rels' by default. In this specific case, it should be added to 'header.xml.rels'.

Solution:
In the AddImage() method of XHTMLImporter, rather than using
BinaryPartAbstractImage.createImagePart(wordMLPackage, imageBytes);

we should use:
BinaryPartAbstractImage.createImagePart( wordMLPackage, sourcePart, imageBytes );

So, if we want to use XHTMLImporter to add html in header or footer, may be we would need to pass the sourcePart to the XHTMLImporter class.

If there is any other way, please let me know....

Thanks

Bug as in BinaryPart's loading parts when the part size is <1024

There seems to be a bug BinaryPart.writeDataToOutputStream() that it incorectly adds bytes to the part in the output stream. This is happening when the part's size is <1024 bytes and then the extra "garbage" bytes of the buffer will also be added to the output.

The correct implementation would be something like - I'm not absolutely sure though, so please correct if needed is:

/**
 * Copy the ByteBuffer containing this part's binary data
 * to an output stream.
 * 
 * @param out
 * @throws IOException
 */
private void writeDataToOutputStream(BinaryPart bpart, OutputStream out) throws IOException {
    ByteBuffer buf = bpart.getBuffer();
    int limit = buf.limit();
    buf.clear();
    byte[] bytes = new byte[limit];
    buf.get(bytes, 0, bytes.length);

    out.write( bytes );     
}

So instead of writing out the whole buffer only the valuable bytes should be written (or compact the buffer before?). Probaby it worths a review if this happens elsewhere too...

xlsx - shared string table - xml:space="preserve" is dropped

becomes

because the schema defines

<xsd:simpleType name="ST_Xstring">
<xsd:annotation>
<xsd:documentation>Escaped String/xsd:documentation
</xsd:annotation>
<xsd:restriction base="xsd:string" />
</xsd:simpleType>

(ie doesn't allow that attribute)

BinaryPartAbstractImage error: java.net.MalformedURLException: no protocol

When using Java 6 on linux I cannot load an image. It appears to work fine on windows.

Caused by: org.docx4j.openpackaging.exceptions.Docx4JException: Error checking image format
        at org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage.ensureFormatIsSupported(BinaryPartAbstractImage.java:429)
        at org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage.ensureFormatIsSupported(BinaryPartAbstractImage.java:331)
        at org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage.createImagePart(BinaryPartAbstractImage.java:225)
        ... 
Caused by: java.net.MalformedURLException: no protocol: /tmp/img8224977656539125863.img
        at java.net.URL.<init>(URL.java:567)
        at java.net.URL.<init>(URL.java:465)
        at java.net.URL.<init>(URL.java:414)
        at org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage.ensureFormatIsSupported(BinaryPartAbstractImage.java:421)
        ...

I believe the URI should be initialised with a file:// protocol prefix.

From BinaryPartAbstractImage.java line 415:

                fos.close();
                fos = null;

                // We need to refresh image info 
                imageManager.getCache().clearCache();
                info = getImageInfo(new URL(imageFile.getAbsolutePath()));

                // Debug ...
                displayImageInfo(info);

e.g.

                fos.close();
                fos = null;

                // We need to refresh image info 
                imageManager.getCache().clearCache();
                info = getImageInfo(new URL("file://" + imageFile.getAbsolutePath()));

                // Debug ...
                displayImageInfo(info);

Unfortunately I'm not near a machine where I can build and verify this fix (yet)

PDF: duplicate hyperlink

In the following code path, treeCopy is called twice:

JAXBException when loading a .docx file

Hi, I'm having an exception when loading a .docx file.
I tried to open a newly created .docx file, the stack didn't show up.
I deleted everything in the .docx file that made the stack happen, and the stack was still there. Watched the underlying xml, and word keeps a relation on the footer even when I ask to delete it. I think the footer is the problematic part.
If I open the document twice, the stack doesn't appear the second time.
Any ideas?
Might be related to this issue: #19
Here is the stack:

ERROR , Cannot initialize context
javax.xml.bind.JAXBException: "org.plutext.jaxb.xmldsig" doesnt contain ObjectFactory.class or jaxb.index
at com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:186)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:128)
at javax.xml.bind.ContextFinder.find(ContextFinder.java:290)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:372)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:337)
at org.docx4j.jaxb.Context.(Context.java:97)
at org.docx4j.openpackaging.contenttype.ContentTypeManager.parseContentTypesFile(ContentTypeManager.java:658)
at org.docx4j.openpackaging.io.LoadFromZipNG.process(LoadFromZipNG.java:206)
at org.docx4j.openpackaging.io.LoadFromZipNG.get(LoadFromZipNG.java:193)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:301)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:245)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:195)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:178)
at org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(WordprocessingMLPackage.java:172)

Coding standards and conventions

Are there any coding standards and coding conventions followed by docx4j? If not, I think it could be the time to do so. What do you think?

Add support for Repeating Section Content Control in Word 2013

Word 2013 introduces the Repeating Section Content Control. It would be very useful if Docx4j would evaluate and populate this control with the repeating data from the custom XML that is bound to it. I think this could be used instead of the OpenDoPE implementation of repeating data.

Please refer to RepeatingSectionContentControl.zip for an example document. It includes:

Order_ContentControls.docx: a very simple order report containing a single order number and the aforementioned Repeating Section Content Control for the OrderLine items.
OrderExample1.xml: the xml data that was loaded into the Word document.
Order.xsd: the schema for the xml data.

See the question on StackOverflow that resulted in this issue.

Where will the XhtmlImporter go?

The XhtmlImporter was a nice tool to get from .xhtml to .docx.
It had just basic Features but fine to use for such basic requirments.
Searched the repos but coudn't find it anymore.

regards, Willi

tables: w:jc should trump w:tblInd

Per the spec, If the resulting justification on any table row is not left after applying the value of the jc element from the three levels of this property (§17.4.26;§17.4.27;§17.4.28), then this property [w:tblInd] shall be ignored.

At present, docx4j is not ignoring w:tblInd in these circumstances, so the test docx at src/test/resources/tables is being rendered differently in PDF compared to Word 2010.

Note that the "three levels" include not just w:tblPr, but each individual row w:trPr

remove System.out.println statements

In the static initializer below there are System.out.println statements that appear on the console without any control by the using application. In order to control the user experience I'll suggest that the ability to control whether statements appear on the console is a required feature. Can these be changed to the logger or have a docx4.properties variable control whether these are printed?

  public class Context {

/*
 * Two reasons for having a separate class for this:
 * 1. so that loading SML context does not slow
 *    down docx4j operation on docx files
 * 2. to try to maintain clean delineation between
 *    docx4j and xlsx4j
 */

public static JAXBContext jcSML;

private static Logger log = Logger.getLogger(Context.class);

static {

    // Display diagnostic info about version of JAXB being used.
    Class c;
    try {
        c = Class.forName("com.sun.xml.bind.marshaller.MinimumEscapeHandler");
        System.out.println("JAXB: Using RI");
    } catch (ClassNotFoundException cnfe) {
        // JAXB Reference Implementation not present
        System.out.println("JAXB: RI not present.  Trying Java 6 implementation.");
        try {
            c = Class.forName("com.sun.xml.internal.bind.marshaller.MinimumEscapeHandler");
            System.out.println("JAXB: Using Java 6 implementation.");
        } catch (ClassNotFoundException e) {
            System.out.println("JAXB: neither Reference Implementation nor Java 6 implementation present?");
        }
    }

Headers have some fixed min-size

Headers have some fixed min-size. It is not possible to make them shorter. They are just too big and ugly. Moving the header more to the top just makes it look like
there is a big space after my text in header.
This is how it looks:

I only use this simple function:
void generatePDF (String path) {

        boolean save = true;
        WordprocessingMLPackage wordMLPackage;

        try {
            wordMLPackage = WordprocessingMLPackage.load(new java.io.File(path));

            Mapper fontMapper = new IdentityPlusMapper();
            wordMLPackage.setFontMapper(fontMapper);

            PhysicalFont font = PhysicalFonts.getPhysicalFonts().get("Arial");
            fontMapper.getFontMappings().put("Helvetica-Bold", font);

            org.docx4j.convert.out.pdf.PdfConversion c = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(wordMLPackage);

            if (save) {
                OutputStream os = new java.io.FileOutputStream(CONSTANTS.pdfOutputPath);            
                c.output(os, new PdfSettings() );
                System.out.println("Saved " + CONSTANTS.pdfOutputPath);
            }  

        } catch (Docx4JException e) {
            e.printStackTrace();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

Paragraph format is lost when the paragraph contains a MERGEFIELD.

When a document contains a paragraph that includes a MERGEFIELD instr element and a mail merge is performed with MailMerger.java, then the formatting of the rest of the text in this paragraph is not maintained.

The problem resides in the FieldsPreprocessor.canonicalise(...) implementation. I have already a workaround ready. I could attach it somewhere or send it to the team to evaluate it and integrate it properly within their source-control regime.

MergeField: multiline result

Need to be able to create a field result such as:

                <w:r>
                    <w:fldChar w:fldCharType="separate"/>
                    <w:t>«Line 1</w:t>
                    <w:br/>
                    <w:t>Line 2»</w:t>
                    <w:fldChar w:fldCharType="end"/>
                </w:r>

ie convert \n into a w:br.

PDF conversion doesn't include text formatting

The XSLFO PdfConversion successfully outputs the text of a WordprocessingMLPackage, but the text formatting (bold, italics, etc) is lost. See https://gist.github.com/jerryorr/4999571 for an example

Docx4j does not update the cached image sizes when the content of the image file changes

If I generate a document A that includes an image file a.png and if I overwrite a.png with a content of a different size before generating a second document B, then the dimensions of the image in B are the same than in A.

The problem seems to come from static BinaryPartAbstractImage.imageManager which stores the informations about the used image the first time they are used but does not invalidates the informations if the file has changed.

The best workaround I have found that does no imply to modify Docx4j is to create a dummy class like this:

package org.docx4j.openpackaging.parts.WordprocessingML {

    public class Workaround {
        public static void clearImageCache() {
            BinaryPartAbstractImage.imageManager.getCache().clearCache();
        }
    }
}

Then I call Workaround.clearImageCache() before generating each document.

Build failure

[ERROR] COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
[ERROR] \Temp\4\docx4j\src\main\java\org\docx4j\convert\out\AbstractTableWriter.java:[50,43] error: cannot find symbol
[ERROR] \Temp\4\docx4j\src\main\java\org\docx4j\convert\out\AbstractTableWriter.java:[343,22] error: cannot find symbol

Probably, caused by changes introduced in commit 5f52ecd

PS. I’m fairly new to all that github thing, just registered to send one pull request. Now I’m trying to ensure it builds on master branch. My changes 05fdacb are quite trivial, maybe I should send my request regardless of current build status?