Comments (12)
Sure, Thanks for your quick response.I would check the same and let you know.
from pymupdf.
This method returns the font name! Using pymupdf.TOOLS.set_subset_fontnames(True)
will return the subset prefix too.
from pymupdf.
BTW please make sure to upgrade your Python version soon.
Version 3.8 will no longer be supported beginning with some release on October.
Seizing support means we will no longer create wheels and stop accepting issues.
from pymupdf.
This worked. Thanks !
Can we get the encoding or the font symbolic name for each span, as there can be different encodings defined for the same base font. Therefore, Font symbolic name helps in this case.
from pymupdf.
This worked. Thanks ! Can we get the encoding or the font symbolic name for each span, as there can be different encodings defined for the same base font. Therefore, Font symbolic name helps in this case.
No, this is not possible. Between fonts having identical names down to even the subset prefix "ABCDEF+" cannot be differentiated.
from pymupdf.
Can we get the font name from the span as well the base font name too?
For eg.:
For a span, I need to have
"font" : "Calibri" and "BaseFont" : "AFHYFG+Calibri" both.
from pymupdf.
If a font is a subset or not can be determined by whether there exists a prefix made of 6 uppercase characters followed by a "+".
There is no other information available.
from pymupdf.
Is there a restriction on the number of characters in the subset font name??
For eg.:
The internal structure had the below as the subset font name
/BaseFont /ABCDFG+TimesNewRomanPSMT-BoldCond
and
TOOLS.set_subset_fontnames(True)
and
span["font"]
returned
ABCDFG+TimesNewRomanPSMT-BoldCo
The last two characters from the subset font name are missing.
Can you let me understand why this had happened?
from pymupdf.
Yes, there is an in-built length restriction of 31 on the font name.
from pymupdf.
Oh, is it??
Which means even though the base font name in the internal structure has the number of characters more than 31, set_subset_fontnames(TRUE), strips it to 31 characters only??
but What if there's a necessity to get the full length base font name???
from pymupdf.
No way to do this - sorry.
from pymupdf.
That's ok.
Appreciate your quick response.
from pymupdf.
Related Issues (20)
- high_level.extract_pages() raises KeyError: MediaBox
- Missig integer values for PDF_OC_* HOT 2
- table extraction not working properly - when there is a change in contrast between Title and rows HOT 2
- Missing literals HOT 3
- About set PDF Filter HOT 2
- Information dropped when using horizontal_strategy="text" in Page.find_tables() HOT 2
- Unable to extract subset font name using the newer versions of PyMuPDF : 1.24.6 and 1.24.7. HOT 11
- Regarding some issues with reading mathematical symbols. HOT 1
- After translating and replacing the translation to the original position, the original position has a white color block. HOT 4
- Encoding issue with get_textbox() only HOT 4
- `Page.get_text` results in `AssertionError` for epub files HOT 3
- I get same origin y values and bbox values for obviously different spans HOT 1
- Pymupdf is unable to identify charts in the pdf HOT 2
- Circular Import Issue HOT 2
- page.search_for(text) splits a line break into two completely different objects HOT 1
- documentation issue - old code in the annotations documentation
- show_pdf_page mistake on page rotate HOT 1
- Incorrect position while drawing Rect
- Document.select() behaves weirdly in some particular kind of pdf files HOT 7
- extend `Document.__getitem__` type annotation to reflect that the method also accepts slices
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pymupdf.