Giter VIP home page Giter VIP logo

Comments (55)

rkh avatar rkh commented on May 17, 2024 1

In Sinatra we adopted the "If I don't know the encoding, use UTF-8" way, which is why default_encoding was introduced in the first place. Encoding is a really hairy thing and the ruby behavior currently differs not only between Windows and Unix but also between Linux and OSX. Take into account that as soon as one string used in a template or the template itself is BINARY, Ruby will raise an exception on interpolation. It is hard already to have proper encoding for user input over HTTP (the HTTP client does not tell you what encoding form data is in). Also note that there was a common agreement on ruby-core to have 1.9.3 default to Unicode rather than making any default dependent on the OS. I would love to keep changes in a separate branch for now and have that branch intensely tested on different platforms with different settings. Targeting 1.4 is a good idea imo.

from tilt.

josh avatar josh commented on May 17, 2024

Summoning @wycats. He wanted this cleaned up before Rails could adopt Tilt. I plead ignorance.

from tilt.

judofyr avatar judofyr commented on May 17, 2024

Calling in some other template engine hackers who might be interested:

Tilt: @rtomayko
Sinatra: @rkh
Rails: @josh
Slim: @stonean, @minad, @fredwu
Haml: @nex3
Issue #66: @julian7

Tilt is working fine at the moment, so this isn't a pressing issue; take your time and let's come up with the best solution together :-)

from tilt.

judofyr avatar judofyr commented on May 17, 2024

The Padrino guys might be interested too I guess: @nesquena and @DAddYE

from tilt.

fibric avatar fibric commented on May 17, 2024

If you try to find file encoding (i.e. a client send you somehow, uploaded somewhere or brought to you by usb or whatnot) you may could get some help from
http://prometheus.rubyforge.org/cmess/

charset = CMess::GuessEncoding::Automatic.guess(input)

from tilt.

DAddYE avatar DAddYE commented on May 17, 2024

100% agree with @rkh, In my opinion we can wait some months and stress test it... or instead wait ruby 1.9.3.

from tilt.

nesquena avatar nesquena commented on May 17, 2024

Related (extended) discussion about this same issue. Keep in mind while this started as a Padrino bug, it runs way deeper back to this issue with Tilt:

padrino/padrino-framework#519 (comment)

I think we do need to address this. As it stands templates cannot contain UTF-8 characters inside them if Encoding.default_internal is set to Encoding::UTF_8 unless an encoding is set to every single template explicitly. In Sinatra, this is hackily avoided since Encoding.default_internal is set to nil. But that decision will come back to bite us relatively quickly. Tilt needs to respect default_external and not default to parsing every template as ASCII-8BIT.

from tilt.

nesquena avatar nesquena commented on May 17, 2024

Quoting @nex3 in that other thread because this seems to make the most sense:

The UTF-8-over-HTTP problem is going to exist just as much with BINARY templates as with default_external-encoded ones. Fundamentally, you just can't hide from users the fact that they must specify the encodings of their external files.

From a template engine's point of view, Rails' solution is pretty much ideal: use default_external (which defaults to UTF-8) when loading a template, but provide a hook for template engines to recognize encoding comments. If no encoding comment is found and the file isn't valid in default_external, throw an error.

from tilt.

rkh avatar rkh commented on May 17, 2024

@nex3, @judofyr, @nesquena, @wycats: What about https://gist.github.com/964024#file_tilt.diff (see discussion in Padrion issue).

from tilt.

nesquena avatar nesquena commented on May 17, 2024

I will run this against the simplest failing case I outlined here: padrino/padrino-framework#519 (comment) and see if it works later and report back. Thanks for whipping up this patch.

from tilt.

mislav avatar mislav commented on May 17, 2024

Agreed with @nesquena. Just got bit by the inability to render CoffeeScript templates which contain multibyte characters (Tilt reads them in as ASCII). Here's the monkeypatch:

Tilt::CoffeeScriptTemplate.class_eval do
  alias original_prepare prepare
  def prepare
    original_prepare
    data.force_encoding @default_encoding
  end
end

from tilt.

nesquena avatar nesquena commented on May 17, 2024

Yeah this is still a lingering problem although admittedly we fixed it hackily by simply no longer setting the correct encodings in 1.9.2. This has unfortunate ramifications but it is better then the alternative from the padrino developers perspective until this gets resolved at the Tilt level.

from tilt.

rtomayko avatar rtomayko commented on May 17, 2024

/cc @apotonick, @brianmario

Some more eyeballs on this issue potentially.

Also wanted to make sure everyone saw this link from @nesquena earlier since it has some really good discussion around how much we should rely on Encoding.default_external:

padrino/padrino-framework#519

@rkh How does that patch factor in? Do you feel like that addresses the main issues with template files on disk?

from tilt.

brianmario avatar brianmario commented on May 17, 2024

I mostly agree with @nex3 on padrino/padrino-framework#519 (comment) (also quoted by @nesquena above).

Unless you use an encoding detection library like charlock_holmes, the only way to reliably know what the actual encoding of the template off the filesystem is to have the coder specify it using a magic header. Encoding.default_external should be used as the fallback, and if it blows up I think it's perfectly reasonable to raise that to the user asking them to specify the encoding of the offending template using a magic header. They should only need to do that for templates that aren't compatible with Encoding.default_external.

Then again, a valid point could be made to say that we should never raise an exception but instead just do our best to display the content - even if it ends up corrupt once sent down to the browser. Maybe that could be a configurable switch? raise_encoding_errors or something? I could go either way, but adding more config options feels like it should be a last-resort.

from tilt.

rtomayko avatar rtomayko commented on May 17, 2024

Then again, a valid point could be made to say that we should never raise an exception but
instead just do our best to display the content - even if it ends up corrupt once sent down to
the browser

That decision can still be left up to the app, though, I think. If you wanted non-draconian encoding error handling you'd set things up to use BINARY everywhere, essentially mimicking 1.8 behavior.

This raises a really good point though. If we move to Encoding.default_external as a default for templates read from disk, which is starting to feel right to me, then shit is going to blow up on existing apps since we'll be draconian by default. I'm going to want to do a major rev on the version number if that's the case.

EDIT: s/Encoding.default_encoding/Encoding.default_external/

from tilt.

DAddYE avatar DAddYE commented on May 17, 2024

@brianmario 100% agree.

from tilt.

DAddYE avatar DAddYE commented on May 17, 2024

I don't know how much cost in terms of performances reading everything in BINARY we should check this; I really hate the magic comment & c. Btw setting Encoding.default_internal or Encoding.default_external is quite simple so the major problem for the developer is to have all files with the same encoding. I think this isn't and bad behavior almost for me. So, my vote is to use Encoding.default_external.

from tilt.

rtomayko avatar rtomayko commented on May 17, 2024

I'm leaning toward @rkh's patch with maybe an additional layer where you can override default_external specifically for tilt.

  • Add Tilt.default_external_encoding, nil by default.
  • Use Encoding.default_external when Tilt.default_external_encoding is nil.
  • Always use the :default_encoding option when given.
  • Always use magic comment when found in template (guess the engine needs to support this?)

This would let you Tilt.default_external_encoding = 'utf8' if you wanted to ensure templates are treated as utf-8 but don't want to change Encoding.default_external for whatever reason. You could also set Tilt.default_external_encoding = 'BINARY' to get the current behavior.

Tilt 1.4 could ship with Tilt.default_external_encoding = 'BINARY' and Tilt 2.0 would ship with Tilt.default_external_encoding = nil.

from tilt.

nesquena avatar nesquena commented on May 17, 2024

@rtomayko Yeah I think we are all on the same page. The Tilt.default_external_encoding which will eventually default to Encoding.default_external is a fairly safe approach. This allows for a smooth transition from the existing behavior. I am happy to test any branch with these adjustments in Padrino to try and reproduce the error as outlined in the old issue. If it works there with utf-8 encodings set, then that should be a good indicator.

from tilt.

rkh avatar rkh commented on May 17, 2024

After figuring out the encoding, should we leave it by that or call String#encode (with no arguments) to convert everything to the same encoding internally? It's what Rails does and @wycats recommends, and should ease dealing with templates in different encoding (imagine templates shipping with a gem and such) but it might mess with the all-binary scenario.

from tilt.

brianmario avatar brianmario commented on May 17, 2024

FWIW the Encoding.default_external= method raises a warning with ruby -w. Not sure what ruby-core has planned for it, but that warning kinda makes me feel like they might disable the ability to set it from inside a running app (would have to be set from $LC_TYPE or $LANG I guess?)`. I don't know either way, but just wanted to make us all aware.

from tilt.

rtomayko avatar rtomayko commented on May 17, 2024

Blah. Yeah that's a whole other dimension to this I haven't considered. We may need Tilt.default_internal_encoding as well, with semantics similar to Tilt.default_external_encoding.

from tilt.

DAddYE avatar DAddYE commented on May 17, 2024

@rtomayko I suggest to keep it more strict only to Encoding.default_external, at the moment I don't know when can be useful have two separate encodings, but is true that we don't break compatibility.

Always use magic comment when found in template (guess the engine needs to support this?)

As said I don't love this, but maybe others would like to use it.

from tilt.

nesquena avatar nesquena commented on May 17, 2024

Interesting, wasn't aware of that warning when setting default_external. We would still be able to access the default encoding type through that even if it isn't expected to be set explicitly. Relying on the default external encoding and encoding everything to that internally strikes me as the most consistent approach that is least likely to be an annoyance for the end user developer as @rkh noted above.

/cc @raggi (would be interested to hear your thoughts)

from tilt.

judofyr avatar judofyr commented on May 17, 2024

Always use magic comment when found in template (guess the engine needs to support this?)

Instead of this I think we should just have a method on Tilt::Template (detect_encoding maybe) which returns nil by default and can be overridden by the template mapper. It's up to the template mapper to decide if it wants to use magic comments or if it just defaults to BINARY.

from tilt.

judofyr avatar judofyr commented on May 17, 2024

After figuring out the encoding, should we leave it by that or call String#encode (with no arguments) to convert everything to the same encoding internally? It's what Rails does and @wycats recommends, and should ease dealing with templates in different encoding (imagine templates shipping with a gem and such) but it might mess with the all-binary scenario.

Well, the output from the template engine isn't guaranteed to be a string. It could be an object or an Array (think Rack-streaming) instead. Beside, it's pretty simple for anyone who uses Tilt to add an #encode at the end.

from tilt.

rkh avatar rkh commented on May 17, 2024

By the way: "@konstantinhaase any reason not to just extract what rails is doing? I spent a lot of time on it and don't want to deal with the bugs again." - @wycats - http://twitter.com/wycats/status/113458749854842880

from tilt.

rtomayko avatar rtomayko commented on May 17, 2024

@rkh Which is ... ?

from tilt.

DAddYE avatar DAddYE commented on May 17, 2024

Bugs?

from tilt.

rkh avatar rkh commented on May 17, 2024

@rtomayko: dunno, haven't looked into it yet.

-- Gesendet von meinem Palm Prē
Ryan Tomayko <[email protected]> schrieb am 13.09.2011 13:02:

@rkh Which is ... ?

Reply to this email directly or view it on GitHub:

#75 (comment)

from tilt.

judofyr avatar judofyr commented on May 17, 2024

I think it's this: https://github.com/rails/rails/blob/master/actionpack/lib/action_view/template.rb#L11

from tilt.

brianmario avatar brianmario commented on May 17, 2024

+1 to what rails is doing

from tilt.

rtomayko avatar rtomayko commented on May 17, 2024

@judofyr Thanks for running that down. Most of AV's behavior seems to make good sense with Tilt, although I think we need a way to enable / disable some of it.

To summarize AV's behavior:

Determining template source encoding:

  • Assume template source is encoded in Encoding.default_external.
  • Use generic Ruby style encoding comment (supported by all templates) if supplied in template source.
  • Use template engine specific encoding comment if supplied in template source.
  • Raise exception when template source is not valid in the determined encoding.

Template source transcoding:

  • Handled by AV before the template source is handed off to template engine regardless of where string comes from.
  • Force template source data string to determined encoding with String#force_encoding.
  • Convert to Encoding.default_internal with String#encode!().
  • Raise exception when template source cannot be converted.

Template source to Ruby source conversion:

  • This is relevant only to source generating templates like ERB
  • Pass template source encoded as Encoding.default_internal into template specific handler.
  • Template handler converts the template source to ruby source.
  • Handlers must return the ruby source string encoded as default_internal.
  • AV also includes a safe-measure call to encode!() to guarantee the ruby source is default_internal.
  • Create compiled render method via call to module_eval.

All that's left is calling the method, which should return a string encoded as default_internal.

AV seems to assume Encoding.default_internal will be set. Tilt needs to work when default_internal is nil, which complicates things somewhat.

One thing that absolutely should happen is: for anything that tilt reads from disk using an IO object it creates, do something like:

data.force_encoding(magic || default_encoding || Encoding.default_external)
data.encode!

Note that the transcode step is a noop when Encoding.default_internal is nil.

Some other things we should consider supporting:

  • The generic magic comment syntax. Would cause force_encoding(encoding) on all template source strings when detected.
  • Raising an exception when template source data is not valid in its encoding (configurable).

The main issue I'm struggling with in all this is conversion to default_internal of template source strings returned from a reader block. I really hoped we could punt that responsibility to the caller but it doesn't work out. Here's my reasoning:

  1. Evaluating all templates in utf-8 / default_internal regardless of external encoding is a worthwhile feature and makes sense for many environments.
  2. You can't get that behavior from IO's normal external to internal conversion because of magic comments (especially template specific ones).
  3. The caller can't be responsible for conversion because they might not know about template specific magic comments either, nor will they want to deal with them even when they do know.
  4. The only option remaining is for Tilt to be responsible for performing external to internal conversion.

Essentially I think we should use Encoding.default_internal as a flag that says "always convert template source from the detected template source encoding to default_internal" regardless of where the source comes from. Perhaps additionally an option is needed to turn that behavior off at the Template instance level; otherwise, evaluating templates under other encodings becomes impossible when Encoding.default_internal is set.

from tilt.

josh avatar josh commented on May 17, 2024

To some extent, encodings should be handled individually by each handler. Not saying tilt shouldn't do anything, but I think each handler class should opt into the magic comment syntax and automatic transcoding. Not everything needs this. The coffee handler only works with utf-8, you can't write CS in anything else. Also sass/less can handle encodings via @charset css directive.

Think the important part is defining the contact for how all handlers should be expected to behave.

Ryan's outline looks good.

from tilt.

rtomayko avatar rtomayko commented on May 17, 2024

@josh Good to know. Yeah, I think we can get some basic stuff into Template without too much work but then we're really going to need to audit each template engine individually.

from tilt.

apotonick avatar apotonick commented on May 17, 2024

On Thu, Sep 15, 2011 at 2:50 AM, Ryan Tomayko
[email protected]
wrote:

@josh Good to know. Yeah, I think we can get some basic stuff into Template without too much work but then we're really going to need to audit each template engine individually.

Are you talking about Tilt::Template or AV::Template here?

Reply to this email directly or view it on GitHub:
#75 (comment)

from tilt.

rtomayko avatar rtomayko commented on May 17, 2024

A more concrete proposal with code: #107

from tilt.

argent-smith avatar argent-smith commented on May 17, 2024

Sorry guys that I suddenly appear from my bear lair (actually I've encountered the problem yesterday and saw that almost everybody stopped discussing it by May), but ... can anybody say how to conquer this issue when I have it with Padrino stack? I use haml/markdown/ruby_vars in UTF-8. How to settle them all?

from tilt.

nesquena avatar nesquena commented on May 17, 2024

@argent-smith Try:

# config/boot.rb
Padrino.after_load do
  Encoding.default_internal = nil
end

does that help?

from tilt.

argent-smith avatar argent-smith commented on May 17, 2024

It helped until I used the single UTF-8 encoded variable. So now I have UTF-8 haml rendered well and "ncompatible character encodings: UTF-8 and ASCII-8BIT" when I try to do "= utf8_string_returning_method" in the haml template.

from tilt.

nesquena avatar nesquena commented on May 17, 2024

The next step would be to try and create a simple stripped down test case of this failure, preferably in Sinatra as a reference. Then if you can, check out this tilt patch from @rtomayko: https://github.com/rtomayko/tilt/pull/107/files, and see if with that applied the issue goes away.

from tilt.

argent-smith avatar argent-smith commented on May 17, 2024

I'll finish the test suite by tomorrow.

from tilt.

argent-smith avatar argent-smith commented on May 17, 2024

See my test suite argent-smith/sinatra-test.

It looks like on clean Sinatra with stock Tilt the issue isn't reproduced: everything goes just fine. I'll make yet another test suite with Padrino and tell you what happens.

from tilt.

argent-smith avatar argent-smith commented on May 17, 2024

Well, after adding markdown to my test I've reproduced the issue. Actually markdown was in my app when I've first encountered the issue. The error doesn't depend on a particular markdown implementation: I've tried kramdown and redcarpet with the same result. For the further details see my test suite mentioned above.

from tilt.

nesquena avatar nesquena commented on May 17, 2024

Great, I was hoping you could isolate this test with sinatra alone, since Padrino doesn't really mess with encodings. So the above repo fails with stock Tilt with Sinatra? Can you try again with the patched Tilt that Ryan has a pull request for and see if that makes the issue go away?

from tilt.

argent-smith avatar argent-smith commented on May 17, 2024

I'll try it ASAP

from tilt.

argent-smith avatar argent-smith commented on May 17, 2024

@nesquena @rtomayko I've got a conflict when tried to merge encodings to master: see /lib/tilt/template.rb in gist 1374322. Would you tell me how to resolve it?

from tilt.

djanowski avatar djanowski commented on May 17, 2024

So... are we going to honor Encoding.default_external?

from tilt.

argent-smith avatar argent-smith commented on May 17, 2024

@djanowski doesn't help in my case (xcuse my stupidity ;-)

from tilt.

apohllo avatar apohllo commented on May 17, 2024

Any progress with that issue?

from tilt.

rue avatar rue commented on May 17, 2024

Is this seriously still unsolved? What do you need to get this done, I’ll help?

from tilt.

rtomayko avatar rtomayko commented on May 17, 2024

@rue Yes. I'd say the work is maybe half done and it's going to require a major rev to release due to the change in behavior. #107 is IMO still the furthest we've got on this and its not being actively worked on. I assume it's suffered from bit rot over the last year as well.

The docs added at the top of https://github.com/rtomayko/tilt/pull/107/files give a basic roadmap of what work remains. The two biggest issues are implementing the :transcode option and auditing every single template engine. If you'd like to help with either, fork and send a pull request with the encodings branch as the base.

from tilt.

minad avatar minad commented on May 17, 2024

@judofyr Any ideas how this will evolve?

from tilt.

svasva avatar svasva commented on May 17, 2024

I've just ran into this issue and I will just leave a workaround here that worked for me.
At the top of an .erb file: <% # encoding: utf-8 %>

from tilt.

judofyr avatar judofyr commented on May 17, 2024

Initial implementation merged in #175. Open a new issue if you have any problems.

from tilt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.