Comments (7)
I am using the bicleaner stuff downloaded from the repositories flagged in github...is there any way to bypass the bicleaner? I dont think it was mandatory previously...
from bitextor.
OK I just commented out the bicleaner...
from bitextor.
yes I am trying to use the bicleaner, but following error being seen:
Error in rule bicleaner:
jobid: 10
output: /opt/bitextor/transient-default-en-fr-small/sentence-pair-testing/hunalign.bicleaner.scores.xz
shell:
slang=$(egrep "source_lang" /opt/bitextor/bicleaner-model/en-fr.yaml | cut -d " " -f 2)
if [ "$slang" == "fr" ]; then
xzcat -T 0 -f /opt/bitextor/transient-default-en-fr-small/sentence-pair-testing/hunalign.bifixer.xz |
/opt/bitextor/preprocess/bin/cache -k 6 python3 /opt/bitextor/bicleaner/bicleaner/bicleaner_classifier_lite.py --score_only -q - - /opt/bitextor/bicleaner-model/en-fr.yaml |
paste <(xzcat -T 0 /opt/bitextor/transient-default-en-fr-small/sentence-pair-testing/hunalign.bifixer.xz) - |
xz -T 0 > /opt/bitextor/transient-default-en-fr-small/sentence-pair-testing/hunalign.bicleaner.scores.xz
else
xzcat -T 0 -f /opt/bitextor/transient-default-en-fr-small/sentence-pair-testing/hunalign.bifixer.xz |
awk ' BEGIN {FS=" "; OFS=" "} { t = $3; $3 = $4; $4 = t; print;}' |
/opt/bitextor/preprocess/bin/cache -k 6 python3 /opt/bitextor/bicleaner/bicleaner/bicleaner_classifier_lite.py --score_only -q - - /opt/bitextor/bicleaner-model/en-fr.yaml |
paste <(xzcat -T 0 /opt/bitextor/transient-default-en-fr-small/sentence-pair-testing/hunalign.bifixer.xz) - |
xz -T 0 > /opt/bitextor/transient-default-en-fr-small/sentence-pair-testing/hunalign.bicleaner.scores.xz
fi
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
from bitextor.
Traceback (most recent call last):
File "/opt/bitextor/bicleaner/bicleaner/bicleaner_classifier_lite.py", line 135, in initialization
args.good_examples = metadata_yaml["good_examples"]
KeyError: 'good_examples'
terminate called after throwing an instance of 'util::EndOfFileException'
what(): End of file
/bin/bash: line 12: 376 Done xzcat -T 0 -f /opt/bitextor/transient-default-en-fr-small/sentence-pair-testing/hunalign.bifixer.xz
377 | awk ' BEGIN {FS=" "; OFS=" "} { t = $3; $3 = $4; $4 = t; print;}'
378 Aborted | /opt/bitextor/preprocess/bin/cache -k 6 python3 /opt/bitextor/bicleaner/bicleaner/bicleaner_classifier_lite.py --score_only -q - - /opt/bitextor/bicleaner-model/en-fr.yaml
379 | paste <(xzcat -T 0 /opt/bitextor/transient-default-en-fr-small/sentence-pair-testing/hunalign.bifixer.xz) -
380 | xz -T 0 > /opt/bitextor/transient-default-en-fr-small/sentence-pair-testing/hunalign.bicleaner.scores.xz
[Sat Mar 6 15:37:26 2021]
from bitextor.
Seems like you are using a modern model for Bicleaner (probably for version 0.14) and your Bitextor installation still uses a previous one (0.13 IIRC). Try using the ones for bicleaner 0.13: https://github.com/bitextor/bicleaner-data/releases/tag/v1.3
from bitextor.
OK thanks for getting back to me so quickly...will try
from bitextor.
RESULT...brilliant what you suggested worked 0.13 version fantastic!
from bitextor.
Related Issues (20)
- Install Alcazar HOT 3
- Process completes without error but does not produce any sentence pairs HOT 1
- Urdu sentence alignment HOT 7
- Inconsistent behaviour of paths in .yaml file HOT 1
- How do you compare two different domains HOT 2
- Problem when run bitextor using document aligner NMT HOT 6
- Document aligner happily returns nothing with piped input HOT 1
- Custom Word Tokenizer Error HOT 3
- CMake build failed v8.1.1 HOT 6
- Bitextor crashes if Bicleaner filters all lines
- Hunalign and Bicleaner errors HOT 3
- Bleualign error HOT 4
- custom_translate getting called without externalMT HOT 2
- External embeddings
- Instruction on running bitextor_align_segments.py for Hunalign only? HOT 1
- Only first file in warc file appears to be processed when "directories" is used as data source HOT 4
- New Bicleaner AI full models HOT 1
- Document level granularity of Paracrawl HOT 1
- Bitextor usage HOT 4
- 404 Error Accessing Latest Paracrawl Bonus Release Raw Files HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bitextor.