Giter VIP home page Giter VIP logo

Comments (13)

lmr avatar lmr commented on August 17, 2024

I've sent a patch to the user to see if it resolves the issue.

from avocado.

ldoktor avatar ldoktor commented on August 17, 2024

Hmm, I'm wondering what character is it. I stopped having those issues a long time ago...

from avocado.

lmr avatar lmr commented on August 17, 2024

@ldoktor It's a chinese user. Considering that we have a lot of people in Beijing using avocado, I wonder how we didn't hit this problem before.

from avocado.

lmr avatar lmr commented on August 17, 2024

According to the user, patch doesn't work. I need to find some time to troubleshoot the problem.

from avocado.

adereis avatar adereis commented on August 17, 2024

Feel free to hate me because of this, but I used to add this to my python projects:

# fscking braindamaged python utf-8 handling :-/
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

And plenty of people agree with me:

http://stackoverflow.com/questions/28657010/dangers-of-sys-setdefaultencodingutf-8?answertab=active#tab-top

If it will work or not in avocado, which uses several third-party modules, I have no idea. But maybe users should give it a try when facing this kind of problem.

from avocado.

adereis avatar adereis commented on August 17, 2024

I run tests with characters from several languages and with different LC_ALL environment variables. I couldn't crash avocado during run and html report generation, but I managed to crash it using avocado list áéçóúwhatever (dir must exist).

from avocado.

adereis avatar adereis commented on August 17, 2024

and the workaround with

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

worked flawlessly in all of my tests (including when using avocado list as reported above)

from avocado.

ldoktor avatar ldoktor commented on August 17, 2024

Well we already use utf8 for rendering the html results. The problem is it contains non-utf8 characters:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')
unicode('\xff', 'utf8')

That's why I added extensive logs so we know where it comes from. I believe it's from json results, but I never got to the results myself...

Regarding the avocado list, the problem is in astring library:

diff --git a/avocado/utils/astring.py b/avocado/utils/astring.py
index 2c764e8..bfd46fa 100644
--- a/avocado/utils/astring.py
+++ b/avocado/utils/astring.py
@@ -25,6 +26,7 @@ string. Even with the dot notation, people may try to do things like
 And not notice until their code starts failing.
 """

+import codecs
 import os.path
 import re

@@ -156,7 +158,7 @@ def iter_tabular_output(matrix, header=None):
             lengths.append(len(column))
     for row in matrix:
         for i, column in enumerate(row):
-            column = unicode(column).encode("utf-8")
+            column = codecs.decode(column, "utf-8")
             col_len = len(column)
             try:
                 max_len = lengths[i]

This fixes it, although I'm not an encoding expert, this is something I used a while ago. I'm also usually using:

# -*- coding: utf8 -*-

but when I placed it into scripts/avocado, it didn't generate the proper results for the astring library. But maybe it can also help in some other cases...

from avocado.

adereis avatar adereis commented on August 17, 2024

# -*- coding: utf8 -*-

This is for the code itself (the strings you write in your text editor). It has no relationship to what happens at runtime and doesn't affect any module.

The trick with sys.setdefaultencoding('utf-8') forces utf-8 to be used everywhere inside the python process. It's considered a bad practice by many, but "it simply works(TM)". Python2 string/utf8 handling is ridiculously insane.

from avocado.

ldoktor avatar ldoktor commented on August 17, 2024

yep, the problem is it won't fix the htmlplugin issue. So do you want me to push the unicode(column).encode("utf-8") vs. codecs.decode(column, "utf-8") modification, or do you want to bring the defaultencoding on our meeting?

One potential problem I see is utf-16 which would give wrong results in avocado if you use the defaultencoding approach...

from avocado.

ldoktor avatar ldoktor commented on August 17, 2024

Actually it should be locale.getdefaultlocale()[1] or "ascii" to use the system encoding. It worked well (you can still trick it by LANG=C avocado list ěščě but that is because xterm lets you write those chars. When you use LANG=cs_CZ.ISO8859-2 xterm and write those chars, it represents them correctly:

LANG=cs_CZ.ISO8859-2 xterm

avocado list ááýý -V
Type    Test
MISSING ááýý

ACCESS_DENIED: 0
BROKEN_SYMLINK: 0
EXTERNAL: 0
FILTERED: 0
INSTRUMENTED: 0
MISSING: 1
NOT_A_TEST: 0
SIMPLE: 0
VT: 0

LANG=cs_CZ.UTF-8 xterm

avocado list -V žčřý
Type    Test
MISSING žčřý

ACCESS_DENIED: 0
BROKEN_SYMLINK: 0
EXTERNAL: 0
FILTERED: 0
INSTRUMENTED: 0
MISSING: 1
NOT_A_TEST: 0
SIMPLE: 0
VT: 0

LANG=C xterm

# I was not able to write most `šěčř` chars, but those two (`áý`) were possible to pass and it still crashed avocado
avocado list -V áý
Avocado crashed unexpectedly: 'ascii' codec can't decode byte 0xe1 in position 0: ordinal not in range(128)
You can find details in /var/tmp/avocado-traceback-2016-02-17_07:42:39-tpIBWa.log

from avocado.

ldoktor avatar ldoktor commented on August 17, 2024

Btw when I use your trick, the result is:

LANG=C xterm

avocado list -V ýá
Type    Test
MISSING <FD><E1>

ACCESS_DENIED: 0
BROKEN_SYMLINK: 0
EXTERNAL: 0
FILTERED: 0
INSTRUMENTED: 0
MISSING: 1
NOT_A_TEST: 0
SIMPLE: 0
VT: 0

And that's not what one would expect... So IMO the codecs solution is cleaner. No doubt it's harder as ideally you should sanitize all user inputs (args).

from avocado.

clebergnu avatar clebergnu commented on August 17, 2024

This should be fixed by the unicode work on the referenced PR.

from avocado.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.