Giter VIP home page Giter VIP logo

Comments (10)

TinoDidriksen avatar TinoDidriksen commented on September 28, 2024 3

and the "html" parser will convert everything <apertium-notrans> as superblanks...

...which is the problem. That ruins contexts. It'd be much better to let it have an analysis that makes sense in context.

Something like <apertium-notrans pos="np">STARTUP</apertium-notrans> which would get turned into ^STARTUP/STARTUP<np><m:notrans>$ by something and respected by lt-proc, or done by lt-proc directly.

Yes, it would require changes to lt-proc, but that will be needed anyway for things like markup handling. Might as well make that part generic also.

from apertium.

khannatanmai avatar khannatanmai commented on September 28, 2024

@unhammer Secondary tags could help here.

from apertium.

TinoDidriksen avatar TinoDidriksen commented on September 28, 2024

Yup. All sorts of outside information could be passed along this way.

from apertium.

xavivars avatar xavivars commented on September 28, 2024

I think the key point here is outside the pipeline. How would you add secondary tags to a word outside of the pipeline? I guess that would require also changes to lt-proc to understand that new "do-not-translate-this-word" markup into a secondary tag....

from apertium.

khannatanmai avatar khannatanmai commented on September 28, 2024

@xavivars I know it says outside the pipeline, but as I understand it, whether to translate or not translate a word is something that will be computed in the pipeline. (Sort of a Named Entity Recognition thingy). Unless if you want to manually provide a list of words that shouldn't be translated. (Isn't this something that can be done in t1x already? Give a list and if the word is part of that list then propagate the SL lemma. It won't be generated but that's fine.)

Or when you say outside the pipeline, do you mean actually having a markup in the input corpus? If you mean that analysis of the word should produce the LU with a <don't translate> tag, that can certainly be done in the monodix if needed.

from apertium.

xavivars avatar xavivars commented on September 28, 2024

Yes, I mean completely outside the pipeline. Apertium currently supports doing that "from outside" the pipeline. You can send this text

<apertium-notrans>This text will not be translated</apertium-notrans>, but this one will.

and the "html" parser will convert everything <apertium-notrans> as superblanks before the morphological anaylizer starts processing tests.

from apertium.

xavivars avatar xavivars commented on September 28, 2024

Definitely, not saying we can't touch lt-proc. My point was as a reply to this

@xavivars I know it says outside the pipeline, but as I understand it, whether to translate or not translate a word is something that will be computed in the pipeline.

Unhammer's request was being able to do that from outside the pipeline.

from apertium.

unhammer avatar unhammer commented on September 28, 2024

Yes, Xavi interpreted my request correctly :) This was for users who don't want to / are not able to change the translator at all, but just need a way to make something in their texts untranslatable.

from apertium.

khannatanmai avatar khannatanmai commented on September 28, 2024

Ah alright I thought the request was about a more general "Mark words that one shouldn't translate". But my initial comment was about secondary tags helping and as @TinoDidriksen said, the markup can be converted to a secondary tag (the same way we would convert html tags to secondary tags and attach them to word LUs.

from apertium.

ftyers avatar ftyers commented on September 28, 2024

Note that this is also related to the issue of codeswitching, e.g. identifying and marking spans as not translatable because they are in another language. I put some thoughts here.

from apertium.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.