Giter VIP home page Giter VIP logo

docx-rs's People

Contributors

cstkingkey avatar dependabot-preview[bot] avatar driver-deploy-2 avatar dsgallups avatar eflanagan0 avatar efx avatar franshk avatar oovm avatar poiscript avatar questionablem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

docx-rs's Issues

Files in docx/zip unknown to docx-rs

Hey there
I've taken interest in working with this crate but I've found that files inside a Word zip that this crate doesn't recognize will get deleted, in other words, it's not lossless. Running the simple example from the readme, my file will lose lots of information simply because some files are not preserved.

The same can be said for elements in the XML. Those unknown to this crate will be skipped. For my use case, this is not acceptable so I want to change these behaviours.

For reference, in my test example, the Word file initially has the following file structure and sizes:

2746    ./[Content_Types].xml
1107    ./docProps/app.xml
1060    ./docProps/custom.xml
861     ./docProps/core.xml
219     ./customXml/item1.xml
1541    ./customXml/item2.xml
1637    ./customXml/item3.xml
17549   ./customXml/item4.xml
296     ./customXml/_rels/item4.xml.rels
296     ./customXml/_rels/item3.xml.rels
296     ./customXml/_rels/item2.xml.rels
296     ./customXml/_rels/item1.xml.rels
321     ./customXml/itemProps3.xml
608     ./customXml/itemProps2.xml
335     ./customXml/itemProps1.xml
1271    ./customXml/itemProps4.xml
737     ./_rels/.rels
3264    ./word/fontTable.xml
16026   ./word/header2.xml
109783  ./word/document.xml
4712    ./word/header1.xml
11419   ./word/settings.xml
19868   ./word/numbering.xml
4356    ./word/footer1.xml
1697    ./word/webSettings.xml
37414   ./word/styles.xml
7074    ./word/theme/theme1.xml
2172    ./word/_rels/document.xml.rels
439     ./word/_rels/settings.xml.rels
2913    ./word/endnotes.xml
2919    ./word/footnotes.xml

And after simply reading and writing (no change), the resulting Word document will have the following internal file structure:

2746    ./[Content_Types].xml
792     ./docProps/app.xml
470     ./docProps/core.xml
597     ./_rels/.rels
1214    ./word/fontTable.xml
1740    ./word/header2.xml
82047   ./word/document.xml
1597    ./word/header1.xml
8545    ./word/settings.xml
1304    ./word/footer1.xml
247     ./word/webSettings.xml
36240   ./word/styles.xml
1338    ./word/_rels/document.xml.rels
457     ./word/endnotes.xml
463     ./word/footnotes.xml

which is a lot less.

My idea would be to keep a list of unknown files which gets written as-is when writing the file. What do you think?

As for the XML, the initial word/document.xml I'm testing against has 109783 bytes, while the written file has 82047 bytes, so many elements get swallowed. For this I would hope the same approach would easily work, but I'm not sure how flexible the underlying strong-xml library is. But my alternative approach/suggestion would be to simply include all possible elements in the structs, even if they're not used/exposed.
To find a list of all possible elements, we could either read the specs or I could analyze a myriad of Word files that I have access to.

Support for SDT entry in TableCell/TableRow

Hello, I am working on a project that requires SDT entries in table cells/rows. I have made a fork of your repository, and my solution is a total hack. I was wondering if you would be interested in me refactoring my code and sending in a PR? Only if it's worth your time.

i.e.:

<w:tr>
  <w:sdt>
    ...
  </w:sdt>
</w:tr>

does not get parsed correctly

Document Properties not being updated / Quick parts are removed

When I am attempting to update document properties, the value is being correctly changed, but there must be a quirk in how these files are being written back into XML since the change is not propagated into the document.

For example, I take a base document, change the subject to "TEST" and re-write to a new file. When I open the file in Word, I see the subject is still empty. Unzipping the word doc manually (unzip test.docx -d unzipped) and looking at the core.xml file I can see the updated value of TEST, however.

As a side note (possibly should be a separate issue), when using document quick parts (which get filled in via document properties), they get removed entirely when using a preexisting document. Any quick parts in the body content are stripped out entirely so there is a potential loss of data.

Table Borders for Left + Right

Would it be possible to implement the code changes found here:
PoiScript#31

Seems like only Top + Bottom borders are currently present, and this PR would add the ability to set borders for all sides.

Error when parsing relativeFrom

I have a docx file (which unfortunately I cannot share here)

Result::unwrap() on an Err value: Xml(FromStr("Unkown Value. Found page, Expected \"column\", \"row\", \"paragraph\","))

it appears as if the RelativeFrom enum (

pub enum RelativeFrom {
) is missing some variants

I'm not sure what parts of this are code-gen'd from the spec/which version of the spec we're targeting, but it appears there are a few more valid variants of this attribute https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.drawing.wordprocessing.horizontalrelativepositionvalues?view=openxml-2.8.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.