Comments (36)
Can you explain why do you need this?
from ttf-parser.
I need to read all glyphs in a font and output some data about the glyph together with the code-point.
This would mean I would have to either iterate over all code-points, or get them out of the cmap.
I was surprised that there was basically no library for this so I looked at this and with a day of work I managed to do all the formats already supported by this library.
I thought I'd at least offer to give back to you so that any future people looking to do this have a library that does this for them.
from ttf-parser.
Your implementation uses Vec
, so this is already out of scope. It should be implemented via Iterator
. I can write it, no problem.
The problem is that it's more complex than this. A font can have multiple cmap subtables at once, even with the same encoding, so I cannot guarantee char's uniqueness. Also, I'm not sure what to do with variation codepoints (subtable 14).
I too never saw a library that supports this, so I guess this it's a rather unusual task.
from ttf-parser.
For the variations I am not sure either, maybe something similar to glyph_variation_index
?
Where you can look up all the variations for a codepoint ?
from ttf-parser.
Where you can look up all the variations for a codepoint ?
Are you talking about a font or the Unicode?
from ttf-parser.
All the variations of a codepoint supported by the font.
from ttf-parser.
In the subtable 14.
from ttf-parser.
Here is an implementation that just gets all the variation sequences.
It isn't very elegant with the Option on top of the vec, but you said you would remove the vecs anyways.
Would this be sufficient?
Also what would exactly be the problem multiple cmap tables at once? You have the same issue when mapping char->glyph right?
from ttf-parser.
Would this be sufficient?
I don't know your use case.
You have the same issue when mapping char->glyph right?
I don't, because I'm using the first matched one.
Anyway, I don't mind implementing it, but we have to settle on the API and use Iterator/zero-allocation implementation instead of Vec.
from ttf-parser.
I don't know your use case.
I was more thinking that this is your library you should decide what is considered good enough. I personally am not dealing with variations, but if I had to in the future I think this would be a good implementation for my use case.
And for the duplicate code-points I am not sure it is an actual issue, with the current implementation the GlyphID is also returned so you can get the proper glyph. If there are duplicates in the cmaps it seems to me that that is a font issue. If you do want it to be unique, we can store the table's index and use a function like get_table_index(c: char) -> usize
to get the table with the first occurrence. If they match its the first occurrence, and if not we skip it.
For the API I think returning an iterator would be ideal, although I wouldnt know how to implement that. Maybe store the current index of every loop we're in, so you can restart all the loops from that index again?
from ttf-parser.
As for the implementation, I can write it for you. The problem is the API itself.
What about a method that will convert a glyph_id into a codepoint? Like Font::codepoint(glyph_id: GlyphId) -> Option<char>
In theory, it will be a bit slower, but it's easier to implement.
from ttf-parser.
If that is easy to implement for you it would certainly be good enough for me.
from ttf-parser.
Ok. I will take a look into it in a few days.
from ttf-parser.
I thought about this some more, and instead of returning an iterator we call a callback function?
So instead of:
for x in Face::list_codepoints(){
//code that handles/registers the codepoint
}
users would have to write:
Face::list_codepoints(|c: char, glyph: GlyphId|{
//code that handles/registers the codepoint
});
it is basically the same code, and if you have an iterator/vec you are probably going to write a loop that does the exact same thing anyways.
With regards to duplicate glyphs, I think that most use-cases for this would want to know every codepoint for a glyph.
The most obvious usecase is a font-viewer, and you have to show every codepoint, not just every glyph.
sorry for the many edits, I forgot ctrl-enter posts the comment on github.
from ttf-parser.
I still not sure how to implement this. Even the basic gid-to-char variant.
The method you proposed is completely incorrect. This is not how cmap
works. Not to mention that CFF can have it's own encoding.
from ttf-parser.
Why is simply iterating over all cmap entries incorrect? cmap is essentially a table mapping from a codepoint to a glyphid.
If I am not mistaken(which I very well might be) simply iterating over all entries would result in all the mappings available.
from ttf-parser.
cmap
can have multiple encodings. Which one should be used? Again, duplicates, variation glyphs, etc. There are no single, correct solution to implements this. And ttf-parser
is strictly following the spec. Any non-trivial stuff should be done manually.
from ttf-parser.
The idea behind this feature would be to just simply iterate over the characters.
I understand that this is not something that can be done "correctly" in a strict definition.
I think the best solution for this(besides a fork like it is now) would be for me to publish a crate that does this using Face::table_data()
and link that from this issue for people to find.
Its up to you if you want to keep this issue open until this feature makes it in ttf_parser or not.
from ttf-parser.
I have no plans on implementing this feature, sorry.
from ttf-parser.
Since the code id coupled to a lot of internal types I don't think its possible to extract it to an external crate.
I don't completely understand why it is not exactly following the specification, but its your crate and your decision, no need to say sorry.
from ttf-parser.
Easy. Can you link another library that does this? FreeType doesn't seems to support it.
from ttf-parser.
No, that is why I wrote this and tried to get this in. No other libraries have this and I thought this would be a unique feature to this one.
from ttf-parser.
Sorry for the drive by comment, is the requested feature needs something like hb_face_collect_unicodes
API?
from ttf-parser.
No other libraries support this for a reason.
from ttf-parser.
@ebraminio hb_face_collect_unicodes
collects codepoints only from a single subtable.
And looks like it ignores variation glyphs, which is kinda pointless? You have to use collect_variation_selectors
and collect_variation_unicodes
instead.
from ttf-parser.
I dont know the implementation details but yes that is pretty much what this is. I dont know how much it matters whether or not it only covers one table.
And @RazrFalcon see hb_face_collect_variation_selectors
and hb_face_collect_variation_unicodes
for that.
Edit: you already noticed, sorry I was just typing my comment :)
from ttf-parser.
I guess we can do this as cmap::Subtable
method. Without any high-level API. Which also means no C API, in case you need one.
from ttf-parser.
What would exactly be the problem with doing this on the whole cmap? If you are afraid of duplicate characters mapping to different glyphs an easy way to resolve that would be to call Face::glyph_index
and see if it resolves to the same glyphId. That way we guarantee that if are going to acces that character later on, you will get the same glyph.
from ttf-parser.
to call Face::glyph_index
This will be absurdly slow. What's wrong with using a specific subtable?
from ttf-parser.
Nothing, I wanted to know what the objections would be against calling Face::glyph_index
.
I completely understand that it would be an extremely slow operation for large fonts.
Thanks for the amount of time you've already put into this issue.
from ttf-parser.
I'd like to have this feature either.
I guess we can do this as
cmap::Subtable
method. Without any high-level API. Which also means no C API, in case you need one.
I think it's a good idea to do with subtable, anyway there's enough information in subtable and we could collect them into something like Vec
outside the crate.
FYI, I'm coming from this patch and would like to switch to ttf-parser as rusttype do so.
from ttf-parser.
I am also interested in this. My use case is indexing fonts to later quickly check whether the font has a specific character without loading it.
I looked at this a little, and my approach would be along these lines:
- Add
iter
methods for each format next to theparse
methods. These would either returnOption<impl Iterator<Item=(u32, u16)>>
or justimpl Iterator<Item=(u32, u16)>
. I think the former would be simpler because the parsing code uses lots of early returns. For the latter, we would need a MaybeIterator enum that is empty in case we would return early. Generally, theiter
methods would probably be quite similar to theparse
methods but I'm not sure there's a way around this duplication. - Then, add an
iter
(or maybe a bit more expressively namedcodepoint_glyph_pairs
) method toSubtable
, which would returnOption<impl Iterator<Item=(u32, GlyphId)>>
(this time theu16
s are mapped toGlyphId
s). Since each format'siter
method returns a distinct type this would need to return either a trait object with dynamic dispatch (which I guess is not possible for this library) or a large enum which is generic over each table's iterator and emulates dynamic dispatch.
If this sounds somewhat sensible I would maybe try and start implementing this. (Maybe not directly all formats, I fear it might get a bit difficult to express the more complex ones as iterators, I wish rust had stable generators ...)
from ttf-parser.
Iterators are indeed difficult for the complex tables, that's why I proposed using a callback instead of an iterator.
For the applications that are really just going to iterate over it once this makes no difference, only negative I can see is that .collect()
would not be available, but just pushing every element do a Vec
shouldn't be that hard.
from ttf-parser.
@laurmaedje I think the bigger problem is that Iterator
will be very slow. Too many unnecessary, indirect calls. The better solution is to use a callback. Like:
subtable.codepoints(|c| println!("{}", c));
It's not as nice as iterators, since you can't use filter()
and stuff. But iterators will be way inefficient.
PS: there will be no duplicates, since we're working on the subtable level.
from ttf-parser.
Yeah, I guess you are both right, that's the better approach here. I can try implementing that!
The remaining question would be what the exact API is here. For once, whether the callback is FnMut(u32)
or FnMut(u32, GlyphId)
. I think, the latter would be a little more flexible and faster in case you need it, but the former would probably lead to less code duplication (no need to parse glyph indices, just codepoints) and would be a bit faster when you don't need the glyph ids.
Also, what would happen when there is an error. Should codepoints
return something indicating an error condition or will the callback simply not be called?
from ttf-parser.
I guess just u32
is good enough for now. And you can ignore subtable14 for now.
On error, you should simply stop parsing. ttf-parser doesn't provide error reporting anyway. We assume that the font is valid.
from ttf-parser.
Related Issues (20)
- f32 has too little precision for offsets HOT 3
- When parsing a regular font (not a collection) the index is ignored HOT 4
- C-api in rust 32-bit environment,Test case call failed
- Zero offsets in `parse_at_offset16` HOT 17
- Add essential links to README HOT 2
- Question about parser.rs HOT 2
- Can you add an example of how to create a simple letter and save as a font? HOT 3
- Serde feature HOT 1
- The direction of outline contour? HOT 3
- how to get a variable instance of a variable font HOT 2
- Support woff2 HOT 1
- Does it support monochrome fonts HOT 1
- parse_char_string_width does not take subroutines into account HOT 6
- Support GDEF Ligature Caret List Table HOT 3
- Getting the number of faces HOT 15
- font_collection_num_fonts_overflow test fails on 32-bit architectures HOT 3
- Missing instance records in fvar::Table HOT 3
- Face::set_variation change in behaviour HOT 1
- Request for Addition of an Interface to Extract Basic Font Information from Files to Support Directory Scan of All Fonts for Rendering Unicode Text HOT 2
- ttf-parser v0.20 fails to compile on Rust 1.76 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ttf-parser.