Giter VIP home page Giter VIP logo

Comments (32)

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Clinton Begin added a comment - 21/Dec/09 02:05 PM
Moving this to new feature request (multiple charset support). You can set the
character set of the Resources class (albeit statically) using setCharset (), 
which
should help you load resources in any consistent character set.

However, if you're using various XML files with different character sets, I 
don't
know what to tell you, other than this is fully within your control and I 
recommend
against it. 

Original comment by [email protected] on 17 May 2010 at 1:37

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024

Original comment by [email protected] on 17 May 2010 at 3:20

  • Added labels: Type-Feature, Priority-Low, Version-Release3.x, Component-SqlMaps
  • Removed labels: Type-Enhancement, Priority-Medium

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024

Original comment by [email protected] on 2 Jun 2010 at 4:44

  • Changed state: Accepted

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
The character set of the Resources class should use UTF-8 as default not system 
encoding unless specified 
otherwise. Isn't this the preferred way of encoding XML documents anyway? 

Original comment by [email protected] on 2 Jun 2010 at 8:47

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
You must load resources as InputStream, not as a Reader, and let the XML parser 
to deal with encodings.

Use "new InputSource(stream)" and it will deal with it.

You assumption that the XML files use system encoding is broken and is a can of 
worms. Especially, if an app is deployed across different systems with 
different default encodings e.g. Windows vs. Linux.

More over, some of your methods ignore the value of Resources.charset field, 
e.g. Resources.getUrlAsReader(String) ignores it.

Using mybatis 3.0.3.

Original comment by [email protected] on 30 Dec 2010 at 9:53

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
[deleted comment]

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
[deleted comment]

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
[deleted comment]

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Issue 189 has been merged into this issue.

Original comment by eduardo.macarron on 31 Dec 2010 at 2:45

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
I have tested this using spanish characters and it actually fails.
In the test attached a mapper.xml file containing a spanish surname "Marañon" 
cannot is readed wrong. The problem is that the file is in UTF-8 but the 
default encoding is used to read it so no-ANSI characters are read wrong.
Run the test to see how it fails.
The patch changes the readers by input streams so any well done XML will be 
read right.
If this patch is applied in 3.0.4 we should also change MyBatis-Spring to use 
Input Streams instead of readers.
I would like someone from the core team to review this before applying the 
patch.
Maybe it is not needed to change SqlSessionFactoryBuilder because it is 
supposed that mybatis-config.xml will not have special characters. But I would 
also change it for consistency.

Original comment by eduardo.macarron on 31 Dec 2010 at 2:51

  • Added labels: Priority-High
  • Removed labels: Priority-Low

Attachments:

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Fixed in r3471 (MyBatis) and r3472(MyBatis-Spring)

Original comment by eduardo.macarron on 31 Dec 2010 at 9:10

  • Changed state: Fixed
  • Added labels: Target-Release3.0.4

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Can anyone explain please why modifying the current behavior when reader is 
able to manage the
Charset encoding? see 
http://download.oracle.com/javase/6/docs/api/java/io/InputStreamReader.html#Inpu
tStreamReader(java.io.InputStream, java.nio.charset.Charset)

Original comment by [email protected] on 31 Dec 2010 at 10:00

  • Changed state: New

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
fine for this release, let see for next versions

Original comment by [email protected] on 31 Dec 2010 at 10:21

  • Changed state: Fixed

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Hi Simo. Let me explain it. In fact this is a quite frequent situation in our 
daily work. Spanish characters like ñ or the "tilde" ó is usually 
problematic. Lots of time we see that apps that work fine in dev environments 
fail when deployed to production systems that use US encoding.  

A InputStream just reads bytes, a Reader reads characters so it needs that you 
provide an encoding to know how to build them.  

So if I want to write a file with spanish characters I should encode it in 
iso-8859-1 or UTF-8. When reading it I must provide the encoding when building 
the Reader.  

This sentence assumes that the file (or whatever it is) uses the default jvm 
encoding
Reader r = new InputStreamReader(is) That will just work if the file is encoded 
in the same encoding than for the jvm.

XML parsers handle this situation. They read the XML prolog and decide which is 
the right encoding to apply when building the Reader. But they need to do it at 
byte level, before any conversion is done.

We could have added extra logic to do the same thing by hand but XML parser 
already handle that situation. So simply feeding them with an InputStream fixes 
all issues.

Run the test I attached with the unfixed revision and you will see that this 
happens.  

Original comment by eduardo.macarron on 31 Dec 2010 at 11:41

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
OK that's all clear, I had same experience with German characters, what is 
still not clear to me is why subclassing an InputStream wrapping a Reader (that 
sounds a little wired and and honestly I don't like) instead of replacing 
methods signature read(Reader) with read(InputStream).
So, ok for this release, but honestly I'll take a deeper look on it for next 
versions.

Original comment by [email protected] on 31 Dec 2010 at 11:56

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Let me put an example. There are at least two places where that wrapping you 
mention should be done:
1.- MapperAnnotationBuilder#loadXmlResource

This class loads an XML file after loading a mapper interface. The previous 
version did:
Reader reader = Resources.getResourceAsReader(type.getClassLoader(), 
xmlResource);

That is wrong because we cannot assume that xml file is in the default jvm 
encoding.
So we should open the file. Read the XML prolog. Get the right encoding. Close 
it and then open it again with a FileReader configured with the right encoding. 

But I suppose that is exactly what the parser does. So better let it do it. 

2.- SqlSessionFactoryBuilder

This class is public API. The manual reads:
String resource = "org/mybatis/example/Configuration.xml"; 
Reader reader = Resources.getResourceAsReader(resource); 
sqlMapper = new SqlSessionFactoryBuilder().build(reader); 

But this is wrong. The user should apply the right encoding, not the default 
one. 
Most of the times it will work (100% for US/UK users). So the wrapping should 
be done by the user but this is too complex for them, they will simply do what 
the manual says and sometimes it will fail and they wont know why.

It is true the mybatis-config.xml is very unlikely that it has special 
characters because it does not have SQL sentences, it is just config. It will 
fail if for example a Chinese user wants to name his mapper.xml files with 
Chinese characters. Sounls like a not a good practice indeed but why not 
letting them doing so if the jvm lets it?

I was thinking in letting this class untouched. But I think it is good that we 
encourage users to use always InputStreams that will always work fine so they 
will save suprises :)

In fact the manual should be changed to read:
String resource = "org/mybatis/example/mybatis-config.xml"; 
InputStream inputStream = Resources.getResourceAsStream(resource); 
sqlMapper = new SqlSessionFactoryBuilder().build(inputStream); 

Original comment by eduardo.macarron on 31 Dec 2010 at 12:13

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
I know what you mean, but again, these are good reasons just to fix&close the 
Issue and maintain the actual APIs - and nobody wants to change them at *this* 
stage, no? - and... opening a file, reading the XML prolog, etc etc is a 
*crazy* idea that I hope nobody wouldn't ever have applied ;)

Wrapping a Reader to an InputStream is a quick'n'dirty fix, for next release 
APIs *have* to be changed.

Original comment by [email protected] on 31 Dec 2010 at 1:40

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Ahh. I am sorry Simo, I thought you was saying that we _should_ do _just_ that 
wrapping. And you are telling exactly the opposite!! Sorry for that brick I 
wrote! :P

Yes, I do agree with that. But cannot think how we will solve that given that 
SqlSessionFactoryBuilder().build(reader) public API. I am afraid we should live 
with that method forever :(

Original comment by eduardo.macarron on 31 Dec 2010 at 1:46

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
ROFL, funny when people don't understand each other but they have exactly the 
same idea XD

I'm not sure we're in a SqlSessionFactoryBuilder#build(Reader) trap, I intend 
to solve it for next release.

Original comment by [email protected] on 31 Dec 2010 at 1:51

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Looking at r3471 (MyBatis), or specifically at XPathParser.java [1]
the XPathParser(String xml) constructor now does

  this.document = createDocument(new ByteArrayInputStream(xml.getBytes()));

where xml.getBytes() uses system encoding. (Which does not necessary able to 
represent all the chars in the original String.) It looks like a legitimate use 
case for using a Reader.

So, a question: maybe it is feasible to provide in parallel pairs of methods: 
one that has an InputStream and another with a Reader.  (Or maybe just pass 
around an org.xml.sax.InputSource).

[1] 
http://code.google.com/p/mybatis/source/browse/trunk/src/main/java/org/apache/ib
atis/parsing/XPathParser.java?spec=svn3471&r=3471

Original comment by [email protected] on 31 Dec 2010 at 5:38

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Hi knst. I think there is no problem with that. xml string is already a String 
so it is always unicode so I am pretty sure it will work fine.

Original comment by eduardo.macarron on 31 Dec 2010 at 5:51

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
> xml string is already a String so it is always unicode
String.getBytes() call will translate it back to bytes (using system encoding), 
after that it will no longer be Unicode. Untranslatable characters will be 
replaced by whatever default replacement for them is (e.g. by '?').
Then an Xml parser will read it back.
If your system encoding is not Unicode-compatible (not UTF-8), or if the parser 
fails to guess the encoding (e.g. the string has <?xml encoding="..">), then it 
will be broken.

Original comment by [email protected] on 31 Dec 2010 at 6:31

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Yep. Thats right. But that will also happen with a xml file that is not in the 
default encoding and has no prolog.

Anyway knst. That methods are not used at all. So we better mark them for 
removal on the next version.

Original comment by eduardo.macarron on 31 Dec 2010 at 6:35

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
If they are not used, then OK. May mark them as deprecated and remove when 
possible.

Original comment by [email protected] on 31 Dec 2010 at 6:44

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Hi knst, finally I followed your advice and removed that ByteArrayInputStream 
for a InputSource. Thanks a lot for your help!

Yesterday MyBatis 3.0.4 was released with this change. I would be great if 
people who stared this issue give it a try to see if encoding issues are gone.

Original comment by eduardo.macarron on 1 Jan 2011 at 10:24

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Looking at r3525,

- this.document = createDocument(new ByteArrayInputStream(xml.getBytes()));
+ this.document = createDocument(new InputSource(xml));

The InputSource(String) constructor serves different purpose: it expects URI of 
the XML file (i.e. its URL, its filename). I would expect the following code 
there:

+ this.document = createDocument(new InputSource(new StringReader(xml)));


Regarding "Marañón" string in EncodingTest.java in r3525:

It relies on "source encoding" option of a Java compiler to be set correctly.
It would be safer to encode those characters as \uXXXX. That would be 
"Mara\u00f1\u00f3n". (The \uXXXX -> char conversion is performed when the 
source file is being read by compiler, so \u sequences can be used in any place 
of the source file, not just is string constants).

Original comment by [email protected] on 1 Jan 2011 at 1:29

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Thanks again knst! I was indeed looking for that because yersterday the build 
failed on Simo's machine because both had different codepages in our IDEs.

I will fix both things when I get home.

Original comment by eduardo.macarron on 1 Jan 2011 at 2:30

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Simo, we still have pending a discussion about removing the ReaderInputStream.

I have been taking a look again at the parsing code and I think you were 100% 
right and it not only can be removed but it should be.

There are two points where a reader/is is created: in MyBatis internals 
(XmlConfigBuilder and AnnotationBuilder) and by the user through 
SqlSessionFactoryBuilder.

The first one should always be an InputStream, because MyBatis will never know 
which is the file encoding. No problem with that, right now MyBatis just uses 
InputStreams internally and the Reader constructor is marked as deprecated 
(anyway it is not called anyware)

The second one (a user provided reader) does not need to be converted to an 
InputStream at all. In fact it should not. We are supposing that the user 
always uses the wrong encoding when creating a reader!! But that is not fair. 
Maybe he knows well what he is doing :) So in that case the reader should make 
its way without conversion until the XPathParser. 

As a summary:
- XmlConfigBuilder (which gets a user created stream):
 - should be able to accept a Reader. 
 - must never convert it to an IS (so ReaderInputStream should not be called)
 - the Reader constructor should not be deprecated.
- XmlMapperBuilder (which get a mybatis created stream):
 - should never accept a Reader 
 - the Reader constructor should stay deprecated
 - although it does not matter, the call to ReaderInputStream can also be removed

As a result ReaderInputStream is not needed at all.

I will provide a patch for this this afternoon.


Original comment by eduardo.macarron on 5 Jan 2011 at 1:05

  • Changed state: New

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
There goes the patch. Please have a look at it. 

Original comment by eduardo.macarron on 5 Jan 2011 at 3:02

Attachments:

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Wow, lots of comments on this one... I didn't read them all... but in a 
nutshell... I agree with tearing out Readers and the ReaderInputStream... I 
remember adding that way back in the ibatis 1.0 days, and I can't believe it's 
survived all the way to 3.x.  I suppose "how XML is read" fell to the bottom of 
my attention filter. :-)


Original comment by [email protected] on 5 Jan 2011 at 6:58

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Ok, have a look at r3564. We should change the user manual to encourage users 
to provide InputStreams to SqlSessionFactoryBuilder.

Original comment by eduardo.macarron on 5 Jan 2011 at 7:33

from mybatis.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024

Original comment by eduardo.macarron on 5 Jan 2011 at 8:17

  • Changed state: Fixed

from mybatis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.