Giter VIP home page Giter VIP logo

Comments (8)

flammie avatar flammie commented on June 26, 2024

I think the best I've been able to make is to work with some sed hack like sed -e 's/$/§§/' | apertium -f line | sed -e 's/§.*//'

from apertium.

TinoDidriksen avatar TinoDidriksen commented on June 26, 2024

We need proper block separation for markup handling anyway, similar to null-flush mode. Proper line handling could be done with that work.

from apertium.

ftyers avatar ftyers commented on June 26, 2024

@flammie even that doesn't work sometimes ...

from apertium.

TinoDidriksen avatar TinoDidriksen commented on June 26, 2024

This remains unchanged even with the word-bound blank work, because word-bound blanks are an orthogonal problem. WBBs must move/split/merge with the token, while what we need here is something that acts like a hard sentence boundary but not a hard context boundary. And normal whitespace should still behave as it currently does - staying sort of in place.

In essence, we need what in CG terms would be delimiters. Something that rules cannot move words across without being explicitly allowed to do so, but where one can still inspect the other sentences for anaphora and context.

For markup handling, we reused null-flush because it is the only thing that is currently safe enough to strictly keep blocks from interacting. Non-null delimiters would need another code point, and it would probably be a significant change to how rules are run.

from apertium.

khannatanmai avatar khannatanmai commented on June 26, 2024

After the latest update to blanks, it is guaranteed to have the same number of line breaks in the transfer input as the output, which is where the line breaks usually used to disappear (or add).

from apertium.

khannatanmai avatar khannatanmai commented on June 26, 2024

I think recursive does this now as well (@mr-martian ?), so we can test this and close it.

from apertium.

mr-martian avatar mr-martian commented on June 26, 2024

recursive blank handling has been updated and should be approximately equivalent to t*x blank handling (it frequently deals with much longer sequences than t1x, so the blanks wind up offset by more, but the basic algorithm is the same)

from apertium.

mr-martian avatar mr-martian commented on June 26, 2024

Somehow or other I'm still getting a different number of line breaks in tx for br-fr (and I can be sure it's tx because rtx in an otherwise identical pipeline doesn't do that).

I think at present -anaphora is the only module that would ever need to look at multiple lines, so what if -wblank-mode had an option that would give a different argument to -anaphora so it could look across nulls?

On the other hand, what I have now several times found myself wanting is a way to translate each line of a file independently, so maybe -anaphora shouldn't be treated differently (or maybe we need 2 different modes).

from apertium.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.