Giter VIP home page Giter VIP logo

Comments (11)

jishac avatar jishac commented on June 8, 2024 1

The hard coded use of utf-8 is likely the cause of bug #44 as well

from offlineimap3.

sudipm-mukherjee avatar sudipm-mukherjee commented on June 8, 2024 1

@thekix yes, sorry I forgot to close it after packaging the fix in Debian.

from offlineimap3.

sudipm-mukherjee avatar sudipm-mukherjee commented on June 8, 2024

Another user reported the same issue at https://bugs.debian.org/981685

from offlineimap3.

ahf avatar ahf commented on June 8, 2024

I'm affected by this as well. I'm using Danish characters in my email signature and have set send_charset = "us-ascii:iso-8859-1:utf-8" in my muttrc. This allows mutt to recode my emails down to the first character set that works, which in most cases is ISO-8859-1, which offlineimap3 seems unhappy about.

Exciting with the Python 3 version of OfflineIMAP!

from offlineimap3.

jishac avatar jishac commented on June 8, 2024

Like @ahf , I have noticed this on sent mail using mutt as well.

Furthermore, the patch listed at Debian bug 981485 will result in issues when the message is synced to an IMAP server since the encoding is hard coded to utf-8 and will result in a discrepancy between the content type listed in the email header and the actual encoding. In other words the email will be encoded for 'utf-8' but say it's encoded as 'iso-8859-1' resulting in mangled text when viewed in an email client.

So a proper fix would either need to mangle the original message to change the encoding type, or the code will need to factor in and store the encoding so that it can be properly encoded/decoded at various points throughout the software.

A work around in the interim is to set send_charset = "us-ascii:utf-8" and avoid using other charsets like 'iso-8859-1'. The change to mutt to fix this offlineimap bug is not ideal but will sidestep the issue of composing messages in mutt for the time being at the cost of a few extra bytes here and there.

from offlineimap3.

Elvith avatar Elvith commented on June 8, 2024

I am affected by this bug as well (the change of encoding from utf-8 to iso-8859-1 is not due to German umlauts in my case, but to the “&” character).

Being the user who reported bug #44, I think that the two bugs are probably related indeed.

from offlineimap3.

thekix avatar thekix commented on June 8, 2024

Hello,

this bug is very interesting, but IMO, it is hard to solve it in the right way :-)

I will try to explain it:

    # Interface from BaseFolder
    def getmessage(self, uid):
        """Return the content of the message."""

        filename = self.messagelist[uid]['filename']
        filepath = os.path.join(self.getfullname(), filename)
        file = open(filepath, 'rt')
        retval = file.read()
        file.close()
        # TODO: WHY are we replacing \r\n with \n here? And why do we
        #      read it as text?
        return retval.replace("\r\n", "\n")

This function reads the message as text, (rt), then replace the carriage return). Probably this is not the right way to do it, and we should simply read the message as binary (take a look in the rb, something like:

    # Interface from BaseFolder
    def getmessage(self, uid):
        """Return the content of the message."""

        filename = self.messagelist[uid]['filename']
        filepath = os.path.join(self.getfullname(), filename)
        file = open(filepath, 'rb')
        retval = file.read()
        file.close()
        return retval

The problem is we need make more changes in other parts. We need check the header to read some values.

IMO, there are three options to solve the problem:

  1. Deep analysis of this code and rewrite some functions. Read it as binary,...
  2. Include a new option in the configuration file to specify the charset (like mutt)
  3. Try to detect the charset

I will try with the last option, because it is backward compatible with offlineimap2 and it is faster than option 1 (I am very very busy these days)

Regards,
kix

PS. I won't close this bug, because I will try to check the option 1.
PS2. Please, the new patch includes a new library, chardet (python3-chardet) in Debian. @sudipm-mukherjee please, check the Depends
PS3. If the patch is working for you, please, add an smile or something to this post (as feedback). Thanks.

from offlineimap3.

jishac avatar jishac commented on June 8, 2024

this bug is very interesting, but IMO, it is hard to solve it in the right way :-)

IMO, there are three options to solve the problem:

1. Deep analysis of this code and rewrite some functions. Read it as binary,...

2. Include a new option in the configuration file to specify the charset (like mutt)

3. Try to detect the charset

I will try with the last option, because it is backward compatible with offlineimap2 and it is faster than option 1 (I am very very busy these days)

Regards,
kix

PS. I won't close this bug, because I will try to check the option 1.
PS2. Please, the new patch includes a new library, chardet (python3-chardet) in Debian. @sudipm-mukherjee please, check the Depends
PS3. If the patch is working for you, please, add an smile or something to this post (as feedback). Thanks.

@thekix - I have made significant progress with option 1 with #48, if you have time to give it a look.... I can create a pull request if desired as I am just getting to testing the changes.

The problem with option 3 is that it won't work given that messages can contain multiple encodings and simply detecting one the encoding doesn't save you when you go to write it back to the server. I will explain further in #53 comments.

from offlineimap3.

thekix avatar thekix commented on June 8, 2024

Hi @jishac

Of course, IMO the option 1 is the best. I was checking your repo/patch, amazing!

Some comments:

Please double check the syntax, for example, some spaces here:

        msg.add_header(headername,headervalue)
        return msg.get_all(headername,[])

I think you are replacing the function "get_message_date()" (file emailutil):

                - message_timestamp = emailutil.get_message_date(content, 'Date')
                + message_timestamp = self.get_message_date(msg, 'Date')

IMO is better change it in the same file. Take a look that you are changing all calls:

kix@inle:~/src/offlineimap3$ rgrep get_message_date * | grep -v binar
offlineimap/emailutil.py:def get_message_date(content, header='Date'):
offlineimap/folder/IMAP.py:            rtime = emailutil.get_message_date(content)
offlineimap/folder/Maildir.py:                message_timestamp = emailutil.get_message_date(content, 'Date')
offlineimap/folder/Maildir.py:                    message_timestamp = emailutil.get_message_date(
offlineimap/folder/Maildir.py:                datestr = emailutil.get_message_date(content)
offlineimap/folder/Maildir.py:                date = emailutil.get_message_date(content, 'Date')
offlineimap/folder/Maildir.py:                datestr = emailutil.get_message_date(content)
kix@inle:~/src/offlineimap3$ 

Please, could you create a new pull request with these changes and with the current offlineimap status? (remove my stuff and include your code).

Again, thanks a lot for your amazing work!!

Best regards,
kix

from offlineimap3.

thekix avatar thekix commented on June 8, 2024

Hello @sudipm-mukherjee

probably we can close this bug. Is it ok?

Regards!
kix

from offlineimap3.

thekix avatar thekix commented on June 8, 2024

Thanks!!

from offlineimap3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.