Giter VIP home page Giter VIP logo

liblouisutdml's Introduction

Introduction

Make check/distcheck

Liblouisutdml is an open-source library providing complete braille transcription services for xml, html and text documents. It translates into appropriate braille codes and formats according to its style sheet and the specifications in the document. A command-line program, file2brl which uses this library is also included. The latest version of liblouis is required. Java bindings are built in to the library.

The library is licensed under the GNU Lesser General Public License (LGPL) version 3 or later. See the file COPYING.LIB.

The command line tools, are licensed under the GNU General Public License version 3.0 or later. See the file COPYING.

Documentation

For documentation, see liblouisutdml.html or liblouisutdml.txt. These are in the docs directory. For an example of a configuration file, see liblouisutdml.ini and preferences.cfg. For examples of semantics-action files, see dtbook.sem and nemeth.sem. These files are in the lbu_files subdirectory. For examples of translation tables, see en-us-g2.ctb, en-us-g1.ctb, chardefs.cti, nemeth.ctb and whatever other files they may include. These are all in the tables directory of liblouis.

Installation

First obtain the latest version of liblouis and compile it. Before compiling, you should chose between 16- and 32-bit Unicode, as described in the README file and the documentation. liblouisutdml inherits this choice from liblouis.

After unpacking the distribution tarball for liblouisutdml go to the directory it creates. After running configure run make and make install. You will need root privileges for the installation step.

This will produce the liblouisutdml library and the program file2brl. To compile the Java bindings go to the java subdirectory and run ant.

Note that that the library and programs will not work properly unless you have first installed the latest version of liblouis.

Docker

There are docker images for liblouisutdml and liblouis. So to run liblouisutdml from docker simply type the following command which will bring you into a shell where you can invoke file2brl:

$ docker run -it liblouis/liblouisutdml /bin/bash
root@74a8b1ad5e03:/usr/src/liblouisutdml# file2brl --help
Usage: file2brl [OPTION] [inputFile] [outputFile]
Translate an xml or a text file into an embosser-ready braille file.
This includes translation into grade two, if desired, mathematical 
codes, etc. It also includes formatting according to a built-in 
style sheet which can be modified by the user.

If inputFile is not specified or '-' input is taken from stdin. If outputFile
is not specified the output is sent to stdout.

  -h, --help          	  display this help and exit
  -v, --version       	  display version information and exit
  -f, --config-file       name a configuration file that specifies
                          how to do the translation
  -b, --backward      	  backward translation
  -r, --reformat      	  reformat a braille file
  -T, --text		  Treat as text even if xml
  -t, --html              html document, not xhtml
  -p, --poorly-formatted  translate a poorly formatted file
  -P, --paragraph-line    treat each block of text ending in a newline
                          as a paragraph. If there are two newline characters
                          a blank line will be inserted before the next paragraph
  -C, --config-setting    specify particular configuration settings
                          They override any settings that are specified in a
                          config file
  -w  --writeable-path    path for temp files and log file
  -l, --log-file          write errors to file2brl.log instead of stderr

Report bugs to <[email protected]>.
root@74a8b1ad5e03:/usr/src/liblouisutdml# 

Docker for cross-compiling

You can use a Dockerfile to cross-compile liblouisutdml using mingw either for 32 or for 64 bit architecture:

# for 32 bit architecture
docker build -f Dockerfile.win32 .
# for 64 bit architecture
docker build -f Dockerfile.win64 .

Then grab the artifact from the docker container.

Or instead let the Makefile do it all for you:

make distwin

liblouisutdml's People

Contributors

bertfrees avatar dkager avatar dusek avatar egli avatar hammera avatar humenda avatar johnjboyer avatar kloczek avatar mwhapples avatar nsoiffer avatar rimas-kudelis avatar sthibaul avatar torchtrust avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

liblouisutdml's Issues

Providing Windows builds

I personally really miss a pre-built version for Windows (Win32). I am really not into C/C++, but I like how LiblouisUTDML works. On Linux, the compilation occurs almost automatically, but on Windows, it is more like a nightmare to me...

unable to build liblouisutdml 2.7.0 with liblouis 3.6.0

Hello,

I'm not quite sure if this belongs in liblouis or liblouis utdml. I'm currently working on upgrading to liblouis 3.6.0 (from 3.2.0). When I try to compile liblouis 3.6.0 against liblouis utdml 2.7.0 I get the following error.

/bin/bash /liblouisutdml-2.7.0/build-aux/missing makeinfo --plaintext liblouisutdml.texi -o liblouisutdml.txt
make[1]: Leaving directory '/liblouisutdml-2.7.0/doc'
Making all in lbu_files
make[1]: Entering directory '/liblouisutdml-2.7.0/lbu_files'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/liblouisutdml-2.7.0/lbu_files'
Making all in liblouisutdml
make[1]: Entering directory '/liblouisutdml-2.7.0/liblouisutdml'
make  all-am
make[2]: Entering directory '/liblouisutdml-2.7.0/liblouisutdml'
/bin/bash ../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I.  -I.. -DLBU_PATH=\"/usr/local/share/liblouisutdml/lbu_files/\" -DLIBLOUIS_TABLES_PATH=\"/usr/local/share/liblouis/tables/\" -DLBULIB -I/opt/jdk1.8.0_144/include -I/opt/jdk1.8.0_144/include/linux -I/usr/local/include/liblouis -I/usr/include/libxml2 -g -O2 -MT liblouisutdml_la-change_table.lo -MD -MP -MF .deps/liblouisutdml_la-change_table.Tpo -c -o liblouisutdml_la-change_table.lo `test -f 'change_table.c' || echo './'`change_table.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -DLBU_PATH=\"/usr/local/share/liblouisutdml/lbu_files/\" -DLIBLOUIS_TABLES_PATH=\"/usr/local/share/liblouis/tables/\" -DLBULIB -I/opt/jdk1.8.0_144/include -I/opt/jdk1.8.0_144/include/linux -I/usr/local/include/liblouis -I/usr/include/libxml2 -g -O2 -MT liblouisutdml_la-change_table.lo -MD -MP -MF .deps/liblouisutdml_la-change_table.Tpo -c change_table.c  -fPIC -DPIC -o .libs/liblouisutdml_la-change_table.o
�[91mIn file included from change_table.c:34:0:
louisutdml.h:120:3: error: redeclaration of enumerator ‘ascii8’
   ascii8
   ^
In file included from louisutdml.h:36:0,
                 from change_table.c:34:
/usr/local/include/liblouis/internal.h:601:53: note: previous definition of ‘ascii8’ was here
 typedef enum { noEncoding, bigEndian, littleEndian, ascii8 } EncodingType;
                                                     ^
�[0m�[91mchange_table.c: In function ‘change_table’:
change_table.c:49:7: warning: implicit declaration of function ‘logMessage’ [-Wimplicit-function-declaration]
       logMessage (LOG_ERROR, "Table %s cannot be found", newTable);
       ^
�[0m�[91mmake[2]: *** [liblouisutdml_la-change_table.lo] Error 1
�[0mMakefile:801: recipe for target 'liblouisutdml_la-change_table.lo' failed
make[2]: Leaving directory '/liblouisutdml-2.7.0/liblouisutdml'
Makefile:660: recipe for target 'all' failed
make[1]: Leaving directory '/liblouisutdml-2.7.0/liblouisutdml'
�[91mmake[1]: *** [all] Error 2
�[0mMakefile:702: recipe for target 'all-recursive' failed
�[91mmake: *** [all-recursive] Error 1

Am I missing something here? Do I need to wait for a new release of liblouisutdml before upgrading?

testsuite memory leaks

Hello,

Building with

CFLAGS="-fsanitize=leak -g" ./configure
make
LSAN_OPTIONS="fast_unwind_on_malloc=0" make check

Shows some memory leaks such as

==1532626==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 6 byte(s) in 3 object(s) allocated from:
    #0 0x7f382b747545 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75
    #1 0x7f382b53742a in xmlStrdup (/lib/x86_64-linux-gnu/libxml2.so.2+0xd042a)
    #2 0x7f382b664e7f in set_sem_attr /home/samy/brl/translation/liblouisutdml.git/liblouisutdml/semantics.c:1181
    #3 0x7f382b65c807 in examine_document /home/samy/brl/translation/liblouisutdml.git/liblouisutdml/examine_document.c:48
    #4 0x7f382b65c9c7 in examine_document /home/samy/brl/translation/liblouisutdml.git/liblouisutdml/examine_document.c:95
    #5 0x7f382b65c9c7 in examine_document /home/samy/brl/translation/liblouisutdml.git/liblouisutdml/examine_document.c:95
    #6 0x7f382b65d252 in processXmlDocument /home/samy/brl/translation/liblouisutdml.git/liblouisutdml/liblouisutdml.c:170
    #7 0x7f382b65d68d in lbu_translateFile /home/samy/brl/translation/liblouisutdml.git/liblouisutdml/liblouisutdml.c:283
    #8 0x55d3f68f9e9c in main /home/samy/brl/translation/liblouisutdml.git/tools/file2brl.c:351
    #9 0x7f382b2b17fc in __libc_start_main ../csu/libc-start.c:332
    #10 0x55d3f68f9249 in _start (/home/samy/ens/projet/1/translation/liblouisutdml.git/tools/.libs/file2brl+0x2249)

it should be possible to fix them.

Samuel

MathML results invalid hex character

What steps will reproduce the problem?
1.Translate the attached xml file
2.Examine the document to find the invalid 0x1b hex character
3.

What is the expected output? What do you see instead?
The translated utd or brf file will contain the 0x1b character.  For xml, this 
character is invalid and results in in a fatal error when attempting to build 
the DOM.

What version of the product are you using? On what operating system?
I am using the current repo on a Windows 7 machine.

Please provide any additional information below.
The inclusion of the 0x1b character seems to occur in larger more complex 
documents containing MathML

Original issue reported on code.google.com by [email protected] on 8 Oct 2013 at 1:21

Attachments:

spurious spaces in output due to style changes

Hello,

When processing the attached file (yes, I had to re-zip it to make github happy) with unzip test.odt ; file2brl content.xml , I am getting "tete te tetete" instead of "tetetetetete" in just one word.

Indeed, the content.xml file contains

</text:span><text:span text:style-name="T10">tete</text:span><text:span text:style-name="T12">te</text:span><text:span text:style-name="T10">tetete</text:span>

where the T12 span is actually spurious because T12 is the same as T10, that is just a consequence of changing the character style etc. on the piece of word and fixing it back again, but libreoffice still remembers the separation, which is something that does happen in practice.

liblouisutdml should know that text:span text:style-name do not separate words, so that the output is tetetetetete.

test.odt.zip

Generating Unicode Braille Patterns

I am trying to generate Unicode Braille Patterns by post processing the output of file2brl.

However, I do not understand what the default output format is. I suspected it to be Braille ASCII. However, this does not seem to be true: Providing the german word "höhe" as input and using the german grade 0 table (de-de-g0.utb) yields "h9he" instead of the expected "h[he". Whats the output format generated by the default tables?

A function overflows or underflows an array access.

Hi
When building on the Open Build Service I get the following warning in the
brp checks;

I: A function overflows or underflows an array access. This could be a real 
error, but occasionaly this condition is also misdetected due to loop unrolling 
or strange pointer handling. So this is warning only, please review.
W: liblouisutdml arraysubscript transcriber.c:511, 379


Which is in this code;

      for (numBytes = MAXBYTES - 1; numBytes >= 0; numBytes--)
    if (ch >= first0Bit[numBytes])
      break;
      utf32 = ch & (0XFF - first0Bit[numBytes]);

It would appear that numBytes can get to -1, hence the issue.

Original issue reported on code.google.com by [email protected] on 5 Feb 2012 at 4:06

Compile time error

Fedora 24, liblouis 2.6.0

I rolled back to 2.5.0 and got it to, at least, compile properly.

logging.c:65:42: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
static logcallback logCallbackFunction = defaultLogCallback;
^~~~~~~~~~~~~~~~~~
logging.c: In function 'lbu_registerLogCallback':
logging.c:69:25: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
logCallbackFunction = defaultLogCallback;
^
logging.c: At top level:
logging.c:130:1: error: conflicting types for 'defaultLogCallback'
defaultLogCallback (logLevels level, const char *message)
^~~~~~~~~~~~~~~~~~
logging.c:63:13: note: previous declaration of 'defaultLogCallback' was here
static void defaultLogCallback(int level, const char *message);
^~~~~~~~~~~~~~~~~~
logging.c:63:13: warning: 'defaultLogCallback' used but never defined
Makefile:824: recipe for target 'liblouisutdml_la-logging.lo' failed

Fix cross-buildability

Hello,

As reported on http://bugs.debian.org/926061 , ilblouis currently can not be cross-built because it looks for pkg-config by hand and doesn't find jni because it uses an outdated ax_jni_include_dir.m4.

The attached patch cross.patch.txt fixes the pkg-config part by just letting PKG_CHECK_MODULES handle it already.

The jni part can be fixed by just removing the copy of ax_jni_include_dir.m4 (and let it be obtained from autoconf-archive) or updating it from autoconf-archive.

Samuel

Add missing declarations of do_pagenum and utd2dsBible to liblouisutdml.h

diff --git a/liblouisutdml/louisutdml.h b/liblouisutdml/louisutdml.h
index eb84639..39ff35d 100644
--- a/liblouisutdml/louisutdml.h
+++ b/liblouisutdml/louisutdml.h
@@ -365,6 +365,7 @@ void do_reverse (xmlNode * node);
int do_boxline (xmlNode * node);
void do_pagebreak (xmlNode *node);
void do_linespacing (xmlNode * node);
+int do_pagenum ();
int do_newpage ();
int do_blankline ();
int do_softreturn ();

diff --git a/liblouisutdml/louisutdml.h b/liblouisutdml/louisutdml.h
index 39ff35d..809f6c8 100644
--- a/liblouisutdml/louisutdml.h
+++ b/liblouisutdml/louisutdml.h
@@ -406,6 +406,7 @@ int wc_string_to_utf8 (const widechar * instr, int *inSize, unsigned
void output_xml (xmlDoc *doc);
int convert_utd ();
int utd2bible (xmlNode * node);
+int utd2dsBible (xmlNode * node);
int utd2brf (xmlNode * node);
int utd2pef (xmlNode * node);
int utd2transinxml (xmlNode * node);

No obvious public available specification of UTDML

There is no obvious, if any, public specification of the UTDML format. There 
should ideally be a link to it from the project home page, available as a 
download, bundled in the releases of liblouisutdml or in the project wiki. A 
UTDML format specification is not available in any of these locations.

Original issue reported on code.google.com by mwhapples on 14 Oct 2013 at 6:02

some more tests fail with liblouis 3.20

Hello,

After upgrading to liblouis 3.20, some more tests have started failing:

FAIL: mathml_nemeth/mover_09
============================

../../lbu_files/nemeth.sem:34: Action or style or macro 'matrix' in column 1 not recognized
FAIL mathml_nemeth/mover_09.test (exit status: 1)

FAIL: mathml_nemeth/mover_10
============================

../../lbu_files/nemeth.sem:34: Action or style or macro 'matrix' in column 1 not recognized
FAIL mathml_nemeth/mover_10.test (exit status: 1)

FAIL: mathml_nemeth/mover_11
============================

../../lbu_files/nemeth.sem:34: Action or style or macro 'matrix' in column 1 not recognized
FAIL mathml_nemeth/mover_11.test (exit status: 1)

FAIL: mathml_nemeth/mover_12
============================

../../lbu_files/nemeth.sem:34: Action or style or macro 'matrix' in column 1 not recognized
FAIL mathml_nemeth/mover_12.test (exit status: 1)

[...]

FAIL: mathml_nemeth/munder_04
=============================

../../lbu_files/nemeth.sem:34: Action or style or macro 'matrix' in column 1 not recognized
FAIL mathml_nemeth/munder_04.test (exit status: 1)

[...]

FAIL: orphanControl_01
======================

pagenum.cti:27: error: The uplow opcode is deprecated.
1 errors found.
nabcc.dis,whitespace.cti,identity.cti,pagenum.cti could not be compiled
liblouisutdml.ini:27: Table 'nabcc.dis,whitespace.cti,identity.cti,pagenum.cti' cannot be found.
liblouisutdml.ini:27: invalid literaryTextTable
pagenum.cti:27: error: The uplow opcode is deprecated.
1 errors found.
nabcc.dis,whitespace.cti,identity.cti,pagenum.cti could not be compiled
liblouisutdml.ini:27: Table 'nabcc.dis,whitespace.cti,identity.cti,pagenum.cti' cannot be found.
liblouisutdml.ini:27: invalid literaryTextTable
FAIL orphanControl_01.test (exit status: 1)

[...]

FAIL: printPageNumberRange_01
=============================

pagenum.cti:27: error: The uplow opcode is deprecated.
1 errors found.
nabcc.dis,whitespace.cti,identity.cti,pagenum.cti could not be compiled
liblouisutdml.ini:28: Table 'nabcc.dis,whitespace.cti,identity.cti,pagenum.cti' cannot be found.
liblouisutdml.ini:28: invalid literaryTextTable
pagenum.cti:27: error: The uplow opcode is deprecated.
1 errors found.
nabcc.dis,whitespace.cti,identity.cti,pagenum.cti could not be compiled
liblouisutdml.ini:28: Table 'nabcc.dis,whitespace.cti,identity.cti,pagenum.cti' cannot be found.
liblouisutdml.ini:28: invalid literaryTextTable
FAIL printPageNumberRange_01.test (exit status: 1)

FAIL: printPageNumberRange_02
=============================

pagenum.cti:27: error: The uplow opcode is deprecated.
1 errors found.
nabcc.dis,whitespace.cti,identity.cti,pagenum.cti could not be compiled
liblouisutdml.ini:28: Table 'nabcc.dis,whitespace.cti,identity.cti,pagenum.cti' cannot be found.
liblouisutdml.ini:28: invalid literaryTextTable
pagenum.cti:27: error: The uplow opcode is deprecated.
1 errors found.
nabcc.dis,whitespace.cti,identity.cti,pagenum.cti could not be compiled
liblouisutdml.ini:28: Table 'nabcc.dis,whitespace.cti,identity.cti,pagenum.cti' cannot be found.
liblouisutdml.ini:28: invalid literaryTextTable
FAIL printPageNumberRange_02.test (exit status: 1)

Samuel

Liblouis UTDML, Join word opcode and hyphenation not cooperate right

A hungarian user reported me an interesting think for Liblouis UTDML and hyphenation for hu-hu-g2.ctb table related:
Some time Liblouis UTDML file2brl command not hyphenate few text line if used the us-table.dis, hu-hu-g2.ctb and hyph_hu_HU.dic tables.
The display table is not interesting, because I reproduced this issue for unicode.dis table too.

Usual, this issue happening when the hu-hu-g2.ctb table need using the joinword opcode.
With hungarian grade2 standard need joining the a and az werbs with the next word, and after the comma character followed word not need putting the space character between the comma character and the next word.

Few example strings for usa braille output:
Normal text
"Nem hiába mondja azt a példabeszéd, hogy minden füstnek tűz"

Usa Braille lines if use the us-table.dis,hu-hu-g2.ctb,hyph_hu_HU.dic tables:
"nemhi@ba mondja a<t
1p*ldabe:*d1h mden f(stnek t)<

The first line end of the 20 TH character (maximum line length is 32 characters), and the next line have the other parts.
The példabeszéd word louis.hyphenate function right hyphenated:
pél-da-be-széd

Normal text:
"Olyan könnyű helyen tartja a fogait."
The first line end of the 24 TH character, and the next line have the other parts.
Braille output this text if used the us-table.dis,hu-hu-g2.ctb,hyph_hu_HU.dic tables:
o_an kq$$) he_en tartja
1fogait'

The fogait, and fogait. words right hyphenate with louis.hyphenate function:
Hyphenated words:
fo-ga-it
fo-ga-it.

The user need preparing an embossable braille document, 20 and 24 character line length is wery short this example texts.

If the hungarian user using the hu-hu-g1.ctb table, this issue is not reproducable, because this table not using the joinword opcode for werbs and comma characters.

Attila

Incorrectly encoding text when backFormat is text

When performing backtranslation with file2brl, if the configuration has backFormat set to text and the text resulting from the backtranslation contains unicode characters outside the ASCII range these will be incorrectly encoded.
As an example, using en-ueb-g2.ctb as the translation table try back translating a word containing an apostrophe (eg. I'M, CAN'T, etc). This results in the apostrophe being produced as the byte 0x19.
Having tested file2brl with backFormat set to html, it appears that in this example the apostrophe gets backtranslated to unicode character \u2019. I therefore suspect file2brl is simply removing the higher byte of the unicode characters when backFormat is set to text.

Using file2brl with --backward option

I'm endeavoring to use the file2brl command.
It seems that the following should give me back what I started with but I get back HTML tags with 'NO TITLE' instead of the original text. What am I misunderstanding?

$ echo 'This is a test.' >input.txt
$ file2brl --text input.txt output.brf
$ file2brl --backward output.brf
<html><head><title>No Title</title></head><body><p></p></body></html>

My need is to convert a .brf file into a readable text file.

Beveled fractions are not converted correctly to Nemeth

The test beveled fraction test (mathml_nemeth/mfrac_02.test) fails:

Image image
Output ?A+B/C+D#
Expected ?A+B_/C+D#

I believe the problem is that there is nothing that checks for the bevelled attr used in mfrac:

<mfrac bevelled="true">

Process terminated with status -1073741819 on Windows

According to https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55 this is a memory access violation. I've only seen this happening in one specific case, on Windows. When I run the same conversion on macOS it works fine.

As is usually the case with memory errors I have no clue where to begin looking for the bug. The bug could be in Liblouis but I think chances are higher that it is a Liblouisutdml bug.

Does anyone have a list of unsolved memory related bugs that have been identified in Liblouisutdml? Do the sanitizer checks that have been added recently help?

some tests fail on some archs

Braille tables for simplified Chinese.

Hi:
I uploaded two braille tables to the project two years ago. But I know those files have some error. Now two braille tables have been made again completely. I download the file included in liblouis now to try it. I find that they are just made by pinyin simply. It is not enough to use. Because there are more prononciation in the same character in simplified Chinese. So we made a largest table to try to cover them as soon as possible. I will upload tables here. Please tell me if there are any error.
In another hands, we need to seperate braille text by word or by a character because of complexity in simplified Chinese to help people to realize all braille text. I know there are a lot of word segmentation software now. Except for using software, is there any similar function in liblouis to do that? Thanks for reply.
zh-chn1.txt
zh-chn2.txt

Belgian math broken

Test 034 is failing because of liblouis/liblouis@507637b. The translator should be made less sensitive to changes in nl-chardefs.uti, in this case by changing

generic msup ,\ei\e^r,\ex

to

generic msup ,\ei\e@34r,\ex

in wiskunde.sem

Segmentation fault on large paragraphs

When translating an XML document that has a large amount of text in an area deemed a paragraph, file2brl is ending with a segmentation fault.

We are running liblouis 3.0.0 with liblouisutdml 2.6.0 HEAD on Linux. I've attached a sample XML file, the config file used, and shell script of the options we use for running file2brl.

The significant portion of the stderr output appears to be this, since I see the exact same pattern in other XML files that fail with the same error. The "inlen" is always the exact same number just before a failure, 8188:

...
Begin insert_text: node->content=367

Begin insert_translation
Performing translation: tableList=en-us-brf.dis,en-us-g2.ctb, inlen=8188
Inbuf=0x000A
...
47 0x0020 0x0020 0x0020 0x0020
Finished insert_translation
Begin write_paragraph
Begin insert_translation
Finished insert_translation, no text to translate
Begin end_style
Begin insert_translation
Finished insert_translation, no text to translate
Finish end_style
Finish write_paragraph
./convert-utdml.sh: line 9: 73318 Segmentation fault: 11  file2brl -f ./bookshare-refreshable.cfg -C

Thanks for any suggestions,
John Brugge

Handling of generic text files

Hi,

I have a suggestion for better handling of generic text files in Liblouisutdml:

Many text files are unformated concerning line width and page length. They simply have a \n or \r\n as an end-of-paragraph marker, and that is it. This applies to text files from various word processors as well as other text files.

Currently, These files need to be preprocessed before using them as infile for file2brl. As far as I can see, there is currently no way to tell Liblouisutdml to interpret line-feed as an end-of-paragraph marker. You need two line-feeds for that.

I would suggest an extra option in the cfg under OutputFormat, along with InputTextEncoding. Call it NewLineIsParagraph or whatever. When it is in effect, One line-feed should produce a new paragraph. two line-feeds should produce a blank line in the output.

Hope this makes sense. I am not sure if this should be posted as a ticket somewhere else, but I thought it might be a good idea to discuss it on the list.

Best regards
Bue Vester-Andersen

file2brl generating a wrong hyphenated word with hungarian eurobraille document

Hi List,

In 2017 Norbert and me founded an interesting situation when using file2brl with following parameters:
file2brl -f hu.cfg -t test.html test.brf
If anybody would like trying reproducing or fix this issue, I attaching four files:
test.htm: this is the small source html document, with I cutted the affected HTML part.
test.brf: this is the wrong way generated hungarian grade1 braille document, with containing the 29TH line the wrong hungarian hyphenation part.
hu.cfg: this file containing my hungarian language specific preferences for file2brl.

In Linux anybody succesfully reproduce this issue if copying the hu.cfg file into /usr/share/liblouisutdml/lbu_files directory, and type following command:
file2brl -f hu.cfg -t test.htm test.brf

In the generated test.brf document 29TH line the file2brl utility wrong hyphenate the "bekezdés" word part.
This situation the hyphen character lands in the 29TH line with 32TH character position.

With Liblouis I verifyed what parts possible hyphenate hungarian language the bekezdés word, following parts resulting good hyphenation:
be-kez-dés
Because the lou_checkhyphens utility impossible to test the bekezdés word because this word containing accented character, I wrote a small python script to easy test any words in hungarian language.
The code is following:
#!/usr/bin/env python3

-- coding: utf-8 --

import louis, sys
def hyphenate_word(word):
try:
hyphen_mask=louis.hyphenate(['hu-hu-g1.ctb', 'hyph_hu_HU.dic'], word, 0)
temp="".join( list(map(lambda a,b: "-"+a if b=='1' else a, word, hyphen_mask)))
hyphenated_word=temp
except RuntimeError:
slice=word.split('-')
temp_hyphenated_word=''
for l in slice:
hyphen_mask=louis.hyphenate(['hu-hu-g1.ctb', 'hyph_hu_HU.dic'], l, 0)
temp="".join( list(map(lambda a,b: "-"+a if b=='1' else a, l, hyphen_mask)))+'-'
temp_hyphenated_word=temp_hyphenated_word+temp
hyphenated_word=hyphenated_word[0:len(hyphenated_word)-1]
return hyphenated_word

word=sys.argv[1]
hyphenated_word=hyphenate_word(word)
print('normal word: '+word)
print('hyphenated word: '+hyphenated_word)

If I run python3 hyphenate.py bekezdés command, I get following right output:
"normal word: bekezdés
hyphenated word: be-kez-dés"
I attaching this small test program too.

Liblouis builtin hyphenate function confirming me the generated beke- hyphenation part is not valid.
In the 29TH line the first right hyphenate part with fit the maximum 32 character line length is "be-", and need putting the next line the "kezdés" word part.
The affected text part right braille output after manual correction is following in eurobraille format in hungarian grade1 braille:
"5qveg. $vajon e2 beh02"sos be-
ke2d1s le5-e?"

How can possible preventing this situation with automatic braille conversion? How can possible for example backlisting this wrong hyphenation if Liblouis part generating good hyphenation masks this word?
Small texts easy correcting this type errors, but a large document when the purpose is a printable braille book, It is a very tedious task with document corrector persons.
Have big chance a large text possible happening more this type issues.

I attaching the affected files.
Attila

LibLouisUTDML is not properly handling translations of annoref elements

What steps will reproduce the problem?
1.Translate the attached file using liblouisutdml
2.Examine the translation result from the second annoref tag
3.

What is the expected output? What do you see instead?
The document contains a single paragraph with two annoref elements.  Starting 
from the 2nd annoref element, translations are incorrectly place through the 
rest of the document.  Incorrectly placed translations are 1 text element prior 
to where they should be located.  

What version of the product are you using? On what operating system?
current repo version, windows 7

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 16 Oct 2013 at 2:23

Attachments:

Stray non-breaking space in BRF output

I'm getting what I think is a stray non-breaking space in BRF output.

  1. I apply file2brf (Version 2.11.0) to an HTML file purpose-built for translation via this method.

  2. HTML contains

<div data-braille="tableofcontents">Contents</div>
  1. Semantic file contains
contentsheader div,data-braille,tableofcontents
  1. Output BRF has

,3t5ts

as the ToC header, where there is a single U+00A0 after the final "s" and before the newline. Clearly visible in my pager (less) and by other means.

I looked through source but couldn't see where a change could be made to test, and a pull request formulated.

Thanks for any help you can provide, this is causiing me to use an incorrect encoding in a Python program that parses the BRF.

https://github.com/PreTeXtBook/pretext/blob/d402bdb3613d95984708150abe2fdb33123f565a/pretext/pretext.py#L2209

Extra space added to text within "a" tags

What steps will reproduce the problem?
1. Open the attached EPUB file in BrailleBlaster.
2. Move to line 78 and note that chapter is broken up with the c at the end of 
the line and the rest of the word on the next line. Brandon says this is 
because there is text within an "a" tag on that line. Liblouisutdml adds a 
space to that text, thereby throwing all the indeces off. That causes this word 
to be split up.
3. Move to line 98 and you can see the same thing with the word body.

What is the expected output? What do you see instead? This space should not be 
added, the indices should stay correct and the words should not be split 
between lines.


What version of the product are you using? On what operating system? I am using 
the latest version of liblouisutdml built from the repository running on 64-bit 
Java 7 on Windows 7 x64.


Please provide any additional information below. There is probably a better way 
to duplicate this bug but I don't know enough about it to know how. John, you 
probably can figure it out from this description.


Original issue reported on code.google.com by [email protected] on 17 Oct 2013 at 9:22

Attachments:

The configtweak semantic action

Cut&paste from an email thread on the list.

I'm processing a HTML document with file2brl. In the html.sem file I
have the following line:

configtweak h4,class,cont literaryTextTable=no-g1.ctb

When I process the HTML document, I get a line

literaryTextTable=no-g1.ctb

converted into literary braille, at the point wher

Heading text

is placed, directly followed by the
"Heading text". All text after this point is perfectly set according to
the table I specified. The problem is that the literaryTextTable=... is
printed out.

I got an answer from Paul Wood:

I know this doesn't FIX the bug, but what I have found is a) I can
reproduce the error and b) if no text is within the


then the extra text which is, I have confirmed, the text from the third
parameter is not added to the result. I imagine the programmer didn't
think that text would be present for the specified xml element used to
call the 'configtweak'. Can anyone remember who programmed this, maybe
they can shed some light on how to solve this prob?

I replied:

Yes

works. But strangely not if the tag is closed
with

, like

.

Issue for testing the issue tracker

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.

This is just a test. I am sorry if it annoys anyone.

Original issue reported on code.google.com by [email protected] on 16 Oct 2013 at 11:02

Fails to build with liblouis 3.3

Hello,

The current git doesn't builld with liblouis3.3 , let alone released versions :)

The issues we have spotted in Debian are

  • the logMessage -> _lou_logMessage renaming,
  • the use of ascii8 in louisutdml.h which conflicts with liblouis'
  • the use of FileInfo in readconfig.c and semantics.c which conflict with liblouis'

Samuel

ComputerBraille translation not put in correct location

What steps will reproduce the problem?
1.Translate the attached xml file
2.Examine the code element and all subsequent translations
3.

What is the expected output? What do you see instead?
The xml file contains a single paragraph contain a code element.  From the code 
element on, translations no longer follow the correct text node and are in the 
wrong location

What version of the product are you using? On what operating system?
Current repo, Windows 7

Please provide any additional information below.
This seems similar to other bugs in the past where translations were put in the 
incorrect location.

Original issue reported on code.google.com by [email protected] on 28 Oct 2013 at 5:33

Attachments:

Extra space in conversion to Nemeth

The general test (mathml_nemeth\test_general_001.test) fails:

Image image
Output ?1/2.P#!;0^2.P"?D.?/A+BSIN  .?# .K ?1/>A^2"-B^2"]#
Expected ?1/2.P#!;0^2.P"?D.?/A+BSIN .?# .K ?1/>A^2"-B^2"]#

Note: for some reason, github is deleting the extra space when I view it (at least in preview mode). The output has 2 spaces after SIN; there should only be one space.

Question - Is there any Python wrapper for liblouisutdml

I have liblouis working under Python but could not get nemeth math to work, I just found out that this is not supported in liblouis and must be done in liblouisutdml with file2brl. Is there any chance to get a python wrapper for libluisutdml or are there technical limitations for this?

Nested roots are not converted correctly to Nemeth

The nested fraction test (mathml_nemeth/test_msqrt_03.test) fails:

Image image
Output >A+>B+>C>D]]]]
Expected >A+.>B+..>C...>D...]..].]]

As with #62, it appears that the code is not dealing with nesting levels. Unlike fractions which announce their complexity at the start/end, radicals count inwards similar to scripts (i.e., the innermost radical repeats the indicator . to indicate the level of nesting.

Documentation for API is of no value as it is wrong

The documentation of the programming API has no value as it is simply wrong. 
The function signatures given in the documentation do not even match the 
function signatures in the code (one such example if the lbu_translateFile 
function although this issue is not limited to just this one function), thus 
there are parameters which are not explained. There are also functions in the 
liblouisutdml.h file which simply do not appear in the documentation.

As one must refer to code files, the documentation does not serve its purpose.

The documentation needs to be updated to reflect the current API of 
liblouisutdml, if people are expected to be able to use the documentation.

Original issue reported on code.google.com by mwhapples on 14 Oct 2013 at 6:16

file2brl -r output is empty

The -r option is not working.

Testcase:

First generate the braille:

file2brl -f preferences.cfg a.txt a.brf

This step works as expected when you using the "formatFor textDevice" output format The a.brf file contains the braille content.

Now try to reformat the brf document:

file2brl -r -f preferences.cfg a.brf b.brf

A b.brf file is created, but the size of this file is 0 byte.

The following temp files are created:

  • file2brl.temp: I think this document contains a braille document, with I think equals the original document, but I am not fully sure.
  • file2brl2.temp: this file is an empty file.
  • lbx_body.temp: this is the back translated html document from a.brf file.

I attaching a zip file containing the preferences.cfg file, the original testing document and the created .brf braille document.

Nested fractions are not converted correctly to Nemeth

Three nested fraction tests (mathml_nemeth/mfrac_04.test, mathml_nemeth/mfrac_06.test, mathml_nemeth/mfrac_07.test) fail:

Image |
image | image | image
------------ | ------------- | ------------ | -------------
Output | ?A/B^,??3/4#/?5/6#,#"# | ?.P+?X/Y#/<M+N>13]# | ,??.P+?X/Y#/13#/Z#
Expected | ?A/B^,??3/4#,/?5/6#,#"# | ,?.P+?X/Y#,/<M+N>13],# | ,,?,?.P+?X/Y#,/13,#,,/Z,,#

There are some commonalities:

  1. The complex fraction slash indicator ,/ is not used (all three tests)
  2. The complex opening/closing indicator ,?/,# are missing in the second and third tests
  3. The even more nested open/closing indicators ,,?/,,# are missing in the third test

In all these cases, just the simple open/slash/close indicator is used.

It appears that the code is not looking at nesting levels.

Typos in Documentation

Excerpt with typos marked bold:

"In addition to liblouisutdml.ini the distribution also sontains a number of configuration files . You should use this file as a refererence. It is the file read by the file2brl command-line interface program if no configuration file is giben."

call of unknown executable xml2brl

The shell scripts msword2brl, pdf2brl, and rtf2brl currently fail because of 
the missing file xml2brl. Shouldn't the calls of xml2brl be replaced with 
file2brl?

Original issue reported on code.google.com by [email protected] on 20 May 2011 at 1:21

lbu_devonly segfaults

When running lbu_devonly from the command-line, it crashes with a segfault 
because pointer "env" is not initialized: 
  - pointer variable "env" defined in file lbu_devonly.c, line 79
  - used in lbu_devonly.c, line 81
  - dereferenced in jliblouisutdml.c, line 40 => segfault

Original issue reported on code.google.com by [email protected] on 1 Apr 2011 at 7:38

Release Version Format

Hi, Would it be possible for you not to use hyphens in your tagged releases? It would make this library easier to package since most package managers do not allow hyphens in versions (which means currently it's difficult to automate the packaging of this library).

Paragraphs limited to 8192 characters in XML, but not in plain text

What steps will reproduce the problem?
1. Run the attached XML file through file2brl. 
2. Check the resulting BRF and where it ends, compared to the original text. 

What is the expected output? What do you see instead?
The string "Professor of Moral Culture" appears four times in the original, 
followed by a number of sentences. Instead of the entire text being translated 
the BRF copy ends at the last "Professor of Moral", leaving off the last 1,600 
or so characters in the paragraph. 

If the same text is copied to a simple text file (no XML), and run through 
file2brl as a text conversion, the complete text gets translated.

What version of the product are you using? On what operating system?
liblouis 2.5.3, liblouisutdml 2.4.0, running on either Ubuntu Linux or OS X.

Please provide any additional information below.
Attached is a fragment of text from the public domain work "The Complete 
Letters of Mark Twain." A single paragraph was copied and pasted four times to 
form a single paragraph of sufficient length to replicate this problem. I have 
also attached the BRF that we get when we convert it.

We are seeing this issue with numerous books in the Bookshare collection. The 
paragraphs that get truncated are long paragraphs in the original text (for 
instance, EPUB), so the paragraph size does not appear to be anything that we 
are extending or adding to through intermediate conversions within Bookshare.

Original issue reported on code.google.com by [email protected] on 30 Sep 2013 at 9:20

Attachments:

MathML translations put mtext element translations in wrong location

What steps will reproduce the problem?
1.Run the attached file through liblouisutdml
2.
3.

What is the expected output? What do you see instead?
MText translation should be part of a single translation for the math element

What version of the product are you using? On what operating system?
using the repo as of the date the issue was posted on a windows 7 machine

Please provide any additional information below.
It is my understanding that MathML translations encapsulate the entire 
translation of a math element in a single brl tag.  This is not happening when 
a mtext element is inside a math element.  It produces unusual results.

Original issue reported on code.google.com by [email protected] on 10 Oct 2013 at 8:35

Attachments:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.