Giter VIP home page Giter VIP logo

Comments (5)

gwynne avatar gwynne commented on June 12, 2024 1

The single-byte collection identifier that the current version of MySQLNIO sends in the protocol HandshakeResponse41 packet is 255, which corresponds to utf8mb4_0900_ai_ci, which means that with MySQL 8.x you get the correct behavior every time.

Of course, that collation ID doesn't exist yet in 5.7 (or, for that matter, in even the most recent versions of MariaDB), so the server falls back on its default-configured character set instead. 5.7 has still been getting maintenance updates all this time (although it's soon to FINALLY be declared EOL and put out of its misery at long last); at some point and they changed the default for new installations (and most hosted solutions and UNIX distros in general patched that default in a much longer time ago), so even those still using it have a decent chance of lucking into a correct configuration. Unfortunately, many - such as yourself - still end up with utf8 (aka utf8mb3, the crippled half-Unicode BMP-only encoding MySQL fail-invented) or latin1 - and since by deliberate design Unicode's first 256 codepoints correspond exactly to those of ISO-Latin-1 (which in turn means the first 128 are the ASCII table), it's easy to not notice it for an extended period.

from mysql-kit.

dezinezync avatar dezinezync commented on June 12, 2024

Temporary workaround: Executing the following query switches the connection to use the desired charset and collation:

SET NAMES 'utf8mb4' COLLATE 'utf8mb4_general_ci';

from mysql-kit.

gwynne avatar gwynne commented on June 12, 2024

This is addressed by the mysql-nio rewrite I've 90% completed (and trust me, no one's more sick of hearing me say it's "almost done" for a year straight than I am 😓😆). I'll make sure this issue gets mentioned in the PR as soon as it's up.

In the meantime, you can work around it by issuing an appropriate SET NAMES query, but because of how Fluent's connection pools currently work, the only way this can work reliably is if you run it inside a .withConnection() closure wrapped around every single route handler (and/or other database usage). For the record, the query would be as simple as:

try await (db as! any MySQLDatabase).simpleQuery("SET NAMES utf8mb4", onRow: { _ in }).run()

Technical note: This is equally as much an issue caused by MySQL 5.7's epically outdated handling of Unicode as it is any fault of the (very shoddy) existing MySQLNIO implementation. For some gory details, have a look at the documentation comments I've included with the relevant logic in the MySQLNIO rewrite: https://gist.github.com/gwynne/95679fc31b6b897799684ad5b9073066. (Please note this will almost certainly not be the final code; as mentioned, the rewrite is still in progress.)

P.S.: As per the comments in the Gist, you probably want to use the utf8mb4_unciode_520_ci collation if you can't update to MySQL 8.x (which I would very strongly recommend if at all possible).

from mysql-kit.

dezinezync avatar dezinezync commented on June 12, 2024

@gwynne I really appreciate the super quick response, with details (especially on a weekend 🫡)

Glad to know this is being refactored and support will be landing soon.

In the meantime, you can work around it by issuing an appropriate SET NAMES query

Yes, I have a temporary workaround for this, but it'd be just cleaner to have upstream support.

As per the comments in the Gist, you probably want to use the utf8mb4_unciode_520_ci collation if you can't update to MySQL 8.x

I'll test with the suggested collation param and follow up shortly.

I'm unable to update to MySQL 8.x immediately, but this is scheduled for December. Do you reckon this is a non-issue when running MySQL 8.x or would the workaround still be necessary until your refactor is complete?

from mysql-kit.

dezinezync avatar dezinezync commented on June 12, 2024

As per the comments in the Gist, you probably want to use the utf8mb4_unciode_520_ci collation if you can't update to MySQL 8.x (which I would very strongly recommend if at all possible)

Tested locally updating MySQL to v8.0.3-rc (latest supported by the OS) and I can confirm this works without any workarounds.

I'll keep the workaround for my production MySQL server which I'm unable to update immediately to v8.

Of course, that collation ID doesn't exist yet in 5.7 (or, for that matter, in even the most recent versions of MariaDB), so the server falls back on its default-configured character set instead. 5.7 has still been getting maintenance updates all this time (although it's soon to FINALLY be declared EOL and put out of its misery at long last);

Yes, as per your suggestion, using utf8mb4_unicode_520_ci works as expected for now.

I'm closing this issue, and I look forward to your updates. I appreciate your help @gwynne.

from mysql-kit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.