Initially reported on Alioth by Michael Terry (03/05/2009): <div class="snippet-cl

Hi On Sat, Apr 18, 2020 at 02:10:1

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Implement suppression POT files to remove strings from translations (like pot_in, but more user-friendly) about po4a HOT 17 OPEN

mquinson commented on July 18, 2024

Implement suppression POT files to remove strings from translations (like pot_in, but more user-friendly)

from po4a.

Comments (17)

osamuaoki commented on July 18, 2024 1

On Sat, Apr 18, 2020 at 02:10:12PM -0700, Martin Quinson wrote: Hello there. Actually, there is a preliminary implementation already in po4a :) If you specify the pot_in for a given document, this is the file used to build the POT and PO files. We have an example in t-02-addendums/book-potin.conf (that I plan to rewrite as I do for all tests currently):

...

But this is very cumbersome, because one has to implement the filtering externally, which kinda goes against the whole spirit of the po4a binary as

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

opposed to the po4a-* tools.

^^^^^^^^^^^^^^^^^^^^^^^^^^^ I see.

I'd prefer to have a filter, as @osamuaoki proposed. I still need to think of how to express such a filter in the config file.

It took me a while for me to understand what exactly you are talking. I hope I understood it correctly. I proposed pot_in as a feature enhancement to po4a to allow an equivalent process with po4a alone as documented in POD as below. || Special case with specifying B<pot_in>: || || <- source files ->|<--------- build results -----------------> || || master document --+--------------------------+ || : | || external : filtered | || filtering ========X..> master | || program document | || | | || V +--> translations || old PO files ----------+--> updated PO files + || ^ | || | V || +<..........................+ || (the updated PO files are manually || copied to the source of the next || release while manually updating || the translation contents) This was a meant to be the simplest use case demonstration example of "pot_in". FTI: Currently, I use po4a-* tools embedded in a Makefile. Let's consider cases. Case 1. debian-reference ~~~~~~~~~~~~~~~~~~~~~~~~ I haven't migrated to the new po4a yet ;-) But, my Makefile for debian-reference does as follows using po4a-*: || <---------------------------------------- source files ->|<--------- build results ----------------------------> || || non-XML +-----> master document ------------------+--> English XML --+--> HTML || master document template -+ external --+ (master) XML | | || +--> merging | +--> PDF || supplimental data --------+ program --+ | || non-XML ^ to generate +----> filtered master document | || | XML files (pot_in) | XML V || | |(pot) +--> translations -+--> HTML || | V ^ | || generation script old PO files ----------+--> updated PO files + XML +--> PDF || wget/sed/... ^ (po) (po) | || | V || +<..........................+ || (the updated PO files are manually || copied to the source of the next || release while manually updating || the translation contents) For this case, both (master) and (pot_in) should be generated at the same time by the external merging program even after migrating to po4a. With (pot_in) feature, I can migrate to po4a. Case 2. XML attribute support with XSLT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As discussed in: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=607726#21 The basic idea is to use tags with attribute in the original XML file. ... like: <screen translate="no">...</screen> <filename translate="no">...</filename> <para translate="no">...</filename> This provides very fine grained control over which parts are translated than selecting by tags. If we chose attribute name from DocBook defined ones, maybe we use "role" as: <screen role="notranslate">...</screen> <filename role="notranslate">...</filename> <para role="notranslate">...</filename> Most likely, such XML file with extra attributes can be used as (master) without modification to generate PDF etc. If you wish to make XML file without extra attributes, removing such attributes can be done by a trivial XSLT script run under the xsltproc command. As for getting (pot_in) filtered master document XML, removing such attribute marked tag section from original XML is again trivial XSLT script run under the xsltproc command. Of course, XSLT script based solution in conjunction with the use of (master) and (pot_in) is very versatile, though the use of XSLT script may be a bit cumbersome for many po4a users. Case 3. Fine grained control support of NOTRANSLATE attribute in po4a Instead of selecting by tags, fine grained control support of NOTRANSLATE attribute can be added to po4a tool chain as a new feature. For XML, I think adding such feature is relatively simple. For other formats such as POD or MAN, we can implement similar feature by preceding each translation section by carefully selected comment line etc. Unlike * tag based approach as you indicated * (master) and (pot_in) approach in case 1 and 2 as I mentioned, this approach should be much easy to figure out for many po4a users. Regards, Osamu

from po4a.

osamuaoki commented on July 18, 2024

I agree adding specific solution with a narrow scope may complicate code without much benefit. That's not good thing. But I think we can do better by adding a generic feature cleanly by not including such translation exclusion code within po4a. (There is such needs See https://bugs.debian.org/607726 . As written there, -o option may have some answer for XML but it wasn't easy for me to implement.)

What po4a should offer is independent ways to specify 2 variants of original English document. One to make POT file and another to make translated text with the help of PO file in po4a.cfg.

Both of these should be generated by the external program.

This approach allow us to include many unstranslatable contents in many parts. This is how I manage to include many auto-generated statistical data included in Debian Reference with manual convoluted Makefile. If po4a support this kind of feature, I can clean up my Makefile :-)

For XML source, we can write XSLT filter to exclude specific tag contents such as ... for use as the input for POT. The final translated document can be generated by PO and the final English document with tag contents such as ... .

For non-XML source, we can deploy CPP predecessor directive to enable similar things by pre-processing.

This approach should be non-invasive and clean, I think...

from po4a.

osamuaoki commented on July 18, 2024

This is follow up to my post yesterday.

As for implementing 2 English base input files, specifying this in current po4a/po4a.cfg syntax isn't trivial and very much confusing.

I think most reasonable approach is to create optional entries in po4a/po4a.cfg to set up custom prefilter programs:

[pot_prefilter]: optional entry to set up prefilter for input source test -> source text fed into "po4a-gettextize -m" option input file (POT generation base file)
[translation_prefilter]: optional entry to set up prefilter for input source test -> source text fed into "po4a-translate -m"option input file (Translation file generation base file)

This approach should be compatible with existing syntax while adding very generic flexibility to po4a infrastructure.

from po4a.

osamuaoki commented on July 18, 2024

Hmmm.. maybe adding option to po4a command for these prefilters may be even better.

from po4a.

mquinson commented on July 18, 2024

I like this idea of pre-filtering the input document before extracting the POT file. I think that this is a very appealing approach to solve this problem. Any help (or even better, patch) going in that direction would be really appreciated.

Thanks for the insight.

from po4a.

mquinson commented on July 18, 2024

Hello there.
Actually, there is a preliminary implementation already in po4a :)

If you specify the pot_in for a given document, this is the file used to build the POT and PO files. We have an example in t-02-addendums/book-potin.conf (that I plan to rewrite as I do for all tests currently):

[po4a_langs] ja
[po4a_paths] tmp/book.pot ja:t-02-addendums/book.po.ja

[type:docbook] t-02-addendums/book-auto.xml \
        pot_in:t-02-addendums/book.xml \
        ja:tmp/book-auto.ja.xml \
        add_ja:t-02-addendums/book.addendum1 \
        opt:"-k 0 -o nodefault=\"<bookinfo> <author>\" \
                  -o break=\"<bookinfo> <author>\" \
                  -o untranslated=\"<bookinfo>\" \
                  -o translated=\"<author>\""

We have:

--- t-02-addendums/book-auto.xml        2020-04-09 00:23:24.801047067 +0200
+++ t-02-addendums/book.xml     2020-04-09 00:23:24.801047067 +0200
@@ -59,11 +59,6 @@
   </totalfake>
 </bogustag>
 </chapter>
-<chapter><title>Title: Auto add text</title>
-<para>
-This is to emulate auto added non-translated content.
-</para>
-</chapter>
 <appendix><title>Title: Optional Appendix</title>
 <para>
 Appendixes are optional.

As a result, these strings are not added to the pot, so their translation is not found in the po, so they remain unchanged. So it ... works.

But this is very cumbersome, because one has to implement the filtering externally, which kinda goes against the whole spirit of the po4a binary as opposed to the po4a-* tools.

I'd prefer to have a filter, as @osamuaoki proposed. I still need to think of how to express such a filter in the config file.

from po4a.

mquinson commented on July 18, 2024

Hello @osamuaoki, thanks for the detailed answer.

I must however confess that I'm a bit lost here. You speak of the pot_in feature as something that would be desirable, but it's already implemented, right? I just pushed some tests to ensure that it will continue to work in the future.

So, maybe you mean that this bug can be closed because the filtering thing that I was suggesting is less useful? If so, I agree. I changed my mind in the meanwhile, and I think that it is much easier to keep the filtering out of the po4a program, that is already rather complex. I don't think that we can find a solution that fits all needs to specify the filtering command line in the po4a.conf, so I take it back: pot_in is sufficient from my point of view, and we could close this issue.

What would be needed from your point of view to close this?

Thanks for your help,
Mt.

from po4a.

mquinson commented on July 18, 2024

Hello @osamuaoki, could you please help me understanding what remains to be done before closing this issue ?

Thanks in advance,

from po4a.

erciccione commented on July 18, 2024

I have some paragraphs in a text file (markdown) that i don't want to have translated since they mostly contain code. Ideally i would have a pot file with some paragraphs marked as "not to translate" that would be ignored during conversions, so to keep them in english in the translated file.

I've been looking for ways to achieve that, but it's hard to find a solution. I now found this issue but it's still not clear to me if it's now possible to mark some paragraphs not-for-translations. Is there currently a native way to achieve this? Is there a workaround i'm missing?

from po4a.

mquinson commented on July 18, 2024

Hello @erciccione, sorry for the delay.

Did you see https://po4a.org/man/man1/po4a.1.php#lbAN in the documentation?

If you've read the doc and it's not sufficient, could you please elaborate on your question? The idea is to produce a filtered file where the content you want to hide is removed. This filtered file should be used as pot_in.

Maybe your question is about how to produce that filtered file removing the content you want to hide? Well, this is not in the field of po4a: you have to filter it on your side, to produce the file that will be used as pot_in in po4a.

I'm not quite sure of how I'd do this for text files. In markdown, I'd use specific markers in comments to indicate the beginning and end of such area to hide, and then I'd come up with a small crude Perl script do do the actual filtering.

from po4a.

jnavila commented on July 18, 2024

I don't think prefilter is a correct solution. If I understand correctly, po4a would not see the content that is tagged as no-translated when generating the pot files because, it would simply be eliminated from original content before. But, when po4a would blend the translations, the eliminated parts would need to be present and they would be counted as not translated, thus defeating the translations statistics of the file and the threshold logic.

from po4a.

mquinson commented on July 18, 2024

Well, that's the currently implemented solution :) What would you propose as a replacement?

from po4a.

mquinson commented on July 18, 2024

Just to be sure we are on the same pace here, @jnavila: Filtering is already implemented and integrated to po4a since several years already. If you want to update it to make it easier for the users, be my guest, but that's already working. There is even some tests.

One thing we could do is to improve Po.pm so that it does not could missing entries as untranslated. That should be rather easy to implement, but it could have bad side effects for people using the po4a-* subscripts in the wrong order. That's a drawback with which I could live, probably.

from po4a.

jnavila commented on July 18, 2024

OK. Thank you for clearing up what's done and what could be enhanced. I cannot commit on changes right now.

from po4a.

osamuaoki commented on July 18, 2024

As far as functional features are concerned, I think this is done deal. Now line matching rules can be created more intuitively, too for addendum.

As for easy usage for end-users for filtering, we may need XML filtering documentation to use attribute with example XSLT+Makefile since they are nontrivial for most people.

So let's rename this issue 77.

from po4a.

mquinson commented on July 18, 2024

Hello,

reading again the logs of this issue, I come to the conclusion that the feature may be implemented and documented, it is still very cumbersome to use. I like very much the idea of @erciccione, of suppression POT files that would be a POT file which msgids get automatically marked as "not to translate". I think that is would be much easier to manage for the users, as you just have to check on your (usual) POT to seach for the entries that shouldn't be here, and copy/paste them unchanged to your suppression file to have them automatically removed. We could even probably warn about unused entries in the suppression file to ease the maintenance of this file (probably, because I'm not sure about split settings which could get in the way).

Internally, that shouldn't be too complex to implement, a bit like the po4a-gettextize internal behavior: after building the pot file from the master documents, just before writing it to disk, you load the suppression file in a new PO object, and then iterate over the entries of that PO object to remove those msgids from the master POT files.

Unfortunately, I'm not sure I'll have to implement this before releasing the long overdue v0.70, so I'm writing this to (1) confirm with you guys that this new feature would be the right answer to your need (2) remember about it the next time that I find some time for po4a.

from po4a.

osamuaoki commented on July 18, 2024

Since my target is XML, filtering by XML-tag is easy. I basically use po4a in 2 stage. Once on filtered XML to create template for PO file. Second time with original XML to produce final result. But for markdown, this strategy doesn't work.

I agree creating blocking-pot file is a reasonable idea to address this needs via data-source neutral way.

msguniq-like filtering is all you need to implement .

from po4a.

Implement suppression POT files to remove strings from translations (like pot_in, but more user-friendly) about po4a HOT 17 OPEN

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent