Comments (8)
I think the best I've been able to make is to work with some sed hack like sed -e 's/$/§§/' | apertium -f line | sed -e 's/§.*//'
from apertium.
We need proper block separation for markup handling anyway, similar to null-flush mode. Proper line handling could be done with that work.
from apertium.
@flammie even that doesn't work sometimes ...
from apertium.
This remains unchanged even with the word-bound blank work, because word-bound blanks are an orthogonal problem. WBBs must move/split/merge with the token, while what we need here is something that acts like a hard sentence boundary but not a hard context boundary. And normal whitespace should still behave as it currently does - staying sort of in place.
In essence, we need what in CG terms would be delimiters. Something that rules cannot move words across without being explicitly allowed to do so, but where one can still inspect the other sentences for anaphora and context.
For markup handling, we reused null-flush because it is the only thing that is currently safe enough to strictly keep blocks from interacting. Non-null delimiters would need another code point, and it would probably be a significant change to how rules are run.
from apertium.
After the latest update to blanks, it is guaranteed to have the same number of line breaks in the transfer input as the output, which is where the line breaks usually used to disappear (or add).
from apertium.
I think recursive does this now as well (@mr-martian ?), so we can test this and close it.
from apertium.
recursive blank handling has been updated and should be approximately equivalent to t*x blank handling (it frequently deals with much longer sequences than t1x, so the blanks wind up offset by more, but the basic algorithm is the same)
from apertium.
Somehow or other I'm still getting a different number of line breaks in tx for br-fr (and I can be sure it's tx because rtx in an otherwise identical pipeline doesn't do that).
I think at present -anaphora is the only module that would ever need to look at multiple lines, so what if -wblank-mode had an option that would give a different argument to -anaphora so it could look across nulls?
On the other hand, what I have now several times found myself wanting is a way to translate each line of a file independently, so maybe -anaphora shouldn't be treated differently (or maybe we need 2 different modes).
from apertium.
Related Issues (20)
- Several (language) parameters in apertium-filter-rules
- Matchings with lookahead in transfer rules HOT 9
- Problem with HTML deformatter entities and UTF-8 HOT 1
- Capitalization restoration does not remove internal marks HOT 1
- apertium-pretransfer -n fails with escaped lemma `\/`
- Conversion to and from the universal tagset HOT 8
- Possibility of showing relevant preferences in text
- apertium-tagger mode that adds probability tags <P:42> instead of removing readings HOT 4
- Suppress `APER1053 apertium-transfer warning: <let> on line 123 sometimes discards its value`?
- <reject-current-rule shifting="yes"/> duplicates superblanks
- apertium-tagger: treat `~` as compound separator
- Build failure with utfcpp 4.0.3 HOT 3
- <reject-current-rule shifting="N"/> for lookahead of >2 word rules
- Improve usability of style preferences HOT 1
- deformatters -o: maybe double newline without end-of-line period should give heading-symbol instead of period HOT 1
- apertium-eo-en triggers apertium-postchunk basic_string::substr out_of_range HOT 2
- Wordbound blanks lost in transfer
- apertium-pretransfer with surface forms fails with compounds
- Possible to titlecase only first word, not every? HOT 3
- c:AA/Aa given to "Distrikts-NRK" HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from apertium.