Giter VIP home page Giter VIP logo

Comments (9)

JorjMcKie avatar JorjMcKie commented on July 22, 2024

Please describe in English!

from pymupdf.

java668 avatar java668 commented on July 22, 2024

Please describe in English!

Please describe in English!
Using this tool to parse PDF Chinese documents resulted in garbled characters. Could you please help me take a look? Thank you very much. PDF document:
Python单元测试框架.pdf

from pymupdf.

JorjMcKie avatar JorjMcKie commented on July 22, 2024

This PDF is full of errors - see the following log during open:

import pymupdf
doc = pymupdf.open("Python (1).pdf")
print(pymupdf.TOOLS.mupdf_warnings())
format error: cannot recognize xref format
trying to repair broken xref
repairing PDF document
Bad or missing parent pointer in outline tree, repairing
... repeated 4 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 3 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 3 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing

When then saving to just contain the first page, no PDF viewer or extraction tool can extract meaningful text.

doc.select([0])
doc.ez_save("page1.pdf")

from pymupdf.

java668 avatar java668 commented on July 22, 2024

This PDF is full of errors - see the following log during open:

import pymupdf
doc = pymupdf.open("Python (1).pdf")
print(pymupdf.TOOLS.mupdf_warnings())
format error: cannot recognize xref format
trying to repair broken xref
repairing PDF document
Bad or missing parent pointer in outline tree, repairing
... repeated 4 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 3 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 3 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing

When then saving to just contain the first page, no PDF viewer or extraction tool can extract meaningful text.

doc.select([0])
doc.ez_save("page1.pdf")

https://github.com/pypdfium2-team/pypdfium2
This can be extracted. Can you help me take a look? Thank you very much

from pymupdf.

JorjMcKie avatar JorjMcKie commented on July 22, 2024

Sorry - as I wrote: this file has severe defects.
Whether or not some tools may still be able to extract things despite of this is a matter outside the scope we can deal with.

from pymupdf.

java668 avatar java668 commented on July 22, 2024

Sorry - as I wrote: this file has severe defects. Whether or not some tools may still be able to extract things despite of this is a matter outside the scope we can deal with.

好的,Thank you very much

from pymupdf.

java668 avatar java668 commented on July 22, 2024

Sorry - as I wrote: this file has severe defects. Whether or not some tools may still be able to extract things despite of this is a matter outside the scope we can deal with.

This PDF is full of errors - see the following log during open:

import pymupdf
doc = pymupdf.open("Python (1).pdf")
print(pymupdf.TOOLS.mupdf_warnings())
format error: cannot recognize xref format
trying to repair broken xref
repairing PDF document
Bad or missing parent pointer in outline tree, repairing
... repeated 4 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 3 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 3 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing

When then saving to just contain the first page, no PDF viewer or extraction tool can extract meaningful text.

doc.select([0])
doc.ez_save("page1.pdf")

How can I determine whether this PDF has errors? Is there a corresponding API? Thank you very much

from pymupdf.

JorjMcKie avatar JorjMcKie commented on July 22, 2024

How can I determine whether this PDF has errors? Is there a corresponding API?

Some errors are already detected when the PDF is opened - like in this case, where the central cross reference (xref) table is broken. MuPDF will then try to repair things by generating a new xref table from walking through he full file. This is usually accompanied by error and warning messages. Some of those are written to the console, the full message are also stored in the area pymupdf.TOOLS.mupdf_warnings() - as shown.

Whether a repair had been tried can be determined by looking at doc.is_repaired.

Not all errors can be detected at open time though. Some will only be exhibited when certain information is extracted like text or during rendering the pages' visual appearance.

from pymupdf.

java668 avatar java668 commented on July 22, 2024

How can I determine whether this PDF has errors? Is there a corresponding API?

Some errors are already detected when the PDF is opened - like in this case, where the central cross reference (xref) table is broken. MuPDF will then try to repair things by generating a new xref table from walking through he full file. This is usually accompanied by error and warning messages. Some of those are written to the console, the full message are also stored in the area pymupdf.TOOLS.mupdf_warnings() - as shown.

Whether a repair had been tried can be determined by looking at doc.is_repaired.

Not all errors can be detected at open time though. Some will only be exhibited when certain information is extracted like text or during rendering the pages' visual appearance.

ok, Thank you very much!

from pymupdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.