Comments (32)
Clinton Begin added a comment - 21/Dec/09 02:05 PM
Moving this to new feature request (multiple charset support). You can set the
character set of the Resources class (albeit statically) using setCharset (),
which
should help you load resources in any consistent character set.
However, if you're using various XML files with different character sets, I
don't
know what to tell you, other than this is fully within your control and I
recommend
against it.
Original comment by [email protected]
on 17 May 2010 at 1:37
from mybatis.
Original comment by [email protected]
on 17 May 2010 at 3:20
- Added labels: Type-Feature, Priority-Low, Version-Release3.x, Component-SqlMaps
- Removed labels: Type-Enhancement, Priority-Medium
from mybatis.
Original comment by [email protected]
on 2 Jun 2010 at 4:44
- Changed state: Accepted
from mybatis.
The character set of the Resources class should use UTF-8 as default not system
encoding unless specified
otherwise. Isn't this the preferred way of encoding XML documents anyway?
Original comment by [email protected]
on 2 Jun 2010 at 8:47
from mybatis.
You must load resources as InputStream, not as a Reader, and let the XML parser
to deal with encodings.
Use "new InputSource(stream)" and it will deal with it.
You assumption that the XML files use system encoding is broken and is a can of
worms. Especially, if an app is deployed across different systems with
different default encodings e.g. Windows vs. Linux.
More over, some of your methods ignore the value of Resources.charset field,
e.g. Resources.getUrlAsReader(String) ignores it.
Using mybatis 3.0.3.
Original comment by [email protected]
on 30 Dec 2010 at 9:53
from mybatis.
[deleted comment]
from mybatis.
[deleted comment]
from mybatis.
[deleted comment]
from mybatis.
Issue 189 has been merged into this issue.
Original comment by eduardo.macarron
on 31 Dec 2010 at 2:45
from mybatis.
I have tested this using spanish characters and it actually fails.
In the test attached a mapper.xml file containing a spanish surname "Marañon"
cannot is readed wrong. The problem is that the file is in UTF-8 but the
default encoding is used to read it so no-ANSI characters are read wrong.
Run the test to see how it fails.
The patch changes the readers by input streams so any well done XML will be
read right.
If this patch is applied in 3.0.4 we should also change MyBatis-Spring to use
Input Streams instead of readers.
I would like someone from the core team to review this before applying the
patch.
Maybe it is not needed to change SqlSessionFactoryBuilder because it is
supposed that mybatis-config.xml will not have special characters. But I would
also change it for consistency.
Original comment by eduardo.macarron
on 31 Dec 2010 at 2:51
- Added labels: Priority-High
- Removed labels: Priority-Low
Attachments:
from mybatis.
Fixed in r3471 (MyBatis) and r3472(MyBatis-Spring)
Original comment by eduardo.macarron
on 31 Dec 2010 at 9:10
- Changed state: Fixed
- Added labels: Target-Release3.0.4
from mybatis.
Can anyone explain please why modifying the current behavior when reader is
able to manage the
Charset encoding? see
http://download.oracle.com/javase/6/docs/api/java/io/InputStreamReader.html#Inpu
tStreamReader(java.io.InputStream, java.nio.charset.Charset)
Original comment by [email protected]
on 31 Dec 2010 at 10:00
- Changed state: New
from mybatis.
fine for this release, let see for next versions
Original comment by [email protected]
on 31 Dec 2010 at 10:21
- Changed state: Fixed
from mybatis.
Hi Simo. Let me explain it. In fact this is a quite frequent situation in our
daily work. Spanish characters like ñ or the "tilde" ó is usually
problematic. Lots of time we see that apps that work fine in dev environments
fail when deployed to production systems that use US encoding.
A InputStream just reads bytes, a Reader reads characters so it needs that you
provide an encoding to know how to build them.
So if I want to write a file with spanish characters I should encode it in
iso-8859-1 or UTF-8. When reading it I must provide the encoding when building
the Reader.
This sentence assumes that the file (or whatever it is) uses the default jvm
encoding
Reader r = new InputStreamReader(is) That will just work if the file is encoded
in the same encoding than for the jvm.
XML parsers handle this situation. They read the XML prolog and decide which is
the right encoding to apply when building the Reader. But they need to do it at
byte level, before any conversion is done.
We could have added extra logic to do the same thing by hand but XML parser
already handle that situation. So simply feeding them with an InputStream fixes
all issues.
Run the test I attached with the unfixed revision and you will see that this
happens.
Original comment by eduardo.macarron
on 31 Dec 2010 at 11:41
from mybatis.
OK that's all clear, I had same experience with German characters, what is
still not clear to me is why subclassing an InputStream wrapping a Reader (that
sounds a little wired and and honestly I don't like) instead of replacing
methods signature read(Reader) with read(InputStream).
So, ok for this release, but honestly I'll take a deeper look on it for next
versions.
Original comment by [email protected]
on 31 Dec 2010 at 11:56
from mybatis.
Let me put an example. There are at least two places where that wrapping you
mention should be done:
1.- MapperAnnotationBuilder#loadXmlResource
This class loads an XML file after loading a mapper interface. The previous
version did:
Reader reader = Resources.getResourceAsReader(type.getClassLoader(),
xmlResource);
That is wrong because we cannot assume that xml file is in the default jvm
encoding.
So we should open the file. Read the XML prolog. Get the right encoding. Close
it and then open it again with a FileReader configured with the right encoding.
But I suppose that is exactly what the parser does. So better let it do it.
2.- SqlSessionFactoryBuilder
This class is public API. The manual reads:
String resource = "org/mybatis/example/Configuration.xml";
Reader reader = Resources.getResourceAsReader(resource);
sqlMapper = new SqlSessionFactoryBuilder().build(reader);
But this is wrong. The user should apply the right encoding, not the default
one.
Most of the times it will work (100% for US/UK users). So the wrapping should
be done by the user but this is too complex for them, they will simply do what
the manual says and sometimes it will fail and they wont know why.
It is true the mybatis-config.xml is very unlikely that it has special
characters because it does not have SQL sentences, it is just config. It will
fail if for example a Chinese user wants to name his mapper.xml files with
Chinese characters. Sounls like a not a good practice indeed but why not
letting them doing so if the jvm lets it?
I was thinking in letting this class untouched. But I think it is good that we
encourage users to use always InputStreams that will always work fine so they
will save suprises :)
In fact the manual should be changed to read:
String resource = "org/mybatis/example/mybatis-config.xml";
InputStream inputStream = Resources.getResourceAsStream(resource);
sqlMapper = new SqlSessionFactoryBuilder().build(inputStream);
Original comment by eduardo.macarron
on 31 Dec 2010 at 12:13
from mybatis.
I know what you mean, but again, these are good reasons just to fix&close the
Issue and maintain the actual APIs - and nobody wants to change them at *this*
stage, no? - and... opening a file, reading the XML prolog, etc etc is a
*crazy* idea that I hope nobody wouldn't ever have applied ;)
Wrapping a Reader to an InputStream is a quick'n'dirty fix, for next release
APIs *have* to be changed.
Original comment by [email protected]
on 31 Dec 2010 at 1:40
from mybatis.
Ahh. I am sorry Simo, I thought you was saying that we _should_ do _just_ that
wrapping. And you are telling exactly the opposite!! Sorry for that brick I
wrote! :P
Yes, I do agree with that. But cannot think how we will solve that given that
SqlSessionFactoryBuilder().build(reader) public API. I am afraid we should live
with that method forever :(
Original comment by eduardo.macarron
on 31 Dec 2010 at 1:46
from mybatis.
ROFL, funny when people don't understand each other but they have exactly the
same idea XD
I'm not sure we're in a SqlSessionFactoryBuilder#build(Reader) trap, I intend
to solve it for next release.
Original comment by [email protected]
on 31 Dec 2010 at 1:51
from mybatis.
Looking at r3471 (MyBatis), or specifically at XPathParser.java [1]
the XPathParser(String xml) constructor now does
this.document = createDocument(new ByteArrayInputStream(xml.getBytes()));
where xml.getBytes() uses system encoding. (Which does not necessary able to
represent all the chars in the original String.) It looks like a legitimate use
case for using a Reader.
So, a question: maybe it is feasible to provide in parallel pairs of methods:
one that has an InputStream and another with a Reader. (Or maybe just pass
around an org.xml.sax.InputSource).
[1]
http://code.google.com/p/mybatis/source/browse/trunk/src/main/java/org/apache/ib
atis/parsing/XPathParser.java?spec=svn3471&r=3471
Original comment by [email protected]
on 31 Dec 2010 at 5:38
from mybatis.
Hi knst. I think there is no problem with that. xml string is already a String
so it is always unicode so I am pretty sure it will work fine.
Original comment by eduardo.macarron
on 31 Dec 2010 at 5:51
from mybatis.
> xml string is already a String so it is always unicode
String.getBytes() call will translate it back to bytes (using system encoding),
after that it will no longer be Unicode. Untranslatable characters will be
replaced by whatever default replacement for them is (e.g. by '?').
Then an Xml parser will read it back.
If your system encoding is not Unicode-compatible (not UTF-8), or if the parser
fails to guess the encoding (e.g. the string has <?xml encoding="..">), then it
will be broken.
Original comment by [email protected]
on 31 Dec 2010 at 6:31
from mybatis.
Yep. Thats right. But that will also happen with a xml file that is not in the
default encoding and has no prolog.
Anyway knst. That methods are not used at all. So we better mark them for
removal on the next version.
Original comment by eduardo.macarron
on 31 Dec 2010 at 6:35
from mybatis.
If they are not used, then OK. May mark them as deprecated and remove when
possible.
Original comment by [email protected]
on 31 Dec 2010 at 6:44
from mybatis.
Hi knst, finally I followed your advice and removed that ByteArrayInputStream
for a InputSource. Thanks a lot for your help!
Yesterday MyBatis 3.0.4 was released with this change. I would be great if
people who stared this issue give it a try to see if encoding issues are gone.
Original comment by eduardo.macarron
on 1 Jan 2011 at 10:24
from mybatis.
Looking at r3525,
- this.document = createDocument(new ByteArrayInputStream(xml.getBytes()));
+ this.document = createDocument(new InputSource(xml));
The InputSource(String) constructor serves different purpose: it expects URI of
the XML file (i.e. its URL, its filename). I would expect the following code
there:
+ this.document = createDocument(new InputSource(new StringReader(xml)));
Regarding "Marañón" string in EncodingTest.java in r3525:
It relies on "source encoding" option of a Java compiler to be set correctly.
It would be safer to encode those characters as \uXXXX. That would be
"Mara\u00f1\u00f3n". (The \uXXXX -> char conversion is performed when the
source file is being read by compiler, so \u sequences can be used in any place
of the source file, not just is string constants).
Original comment by [email protected]
on 1 Jan 2011 at 1:29
from mybatis.
Thanks again knst! I was indeed looking for that because yersterday the build
failed on Simo's machine because both had different codepages in our IDEs.
I will fix both things when I get home.
Original comment by eduardo.macarron
on 1 Jan 2011 at 2:30
from mybatis.
Simo, we still have pending a discussion about removing the ReaderInputStream.
I have been taking a look again at the parsing code and I think you were 100%
right and it not only can be removed but it should be.
There are two points where a reader/is is created: in MyBatis internals
(XmlConfigBuilder and AnnotationBuilder) and by the user through
SqlSessionFactoryBuilder.
The first one should always be an InputStream, because MyBatis will never know
which is the file encoding. No problem with that, right now MyBatis just uses
InputStreams internally and the Reader constructor is marked as deprecated
(anyway it is not called anyware)
The second one (a user provided reader) does not need to be converted to an
InputStream at all. In fact it should not. We are supposing that the user
always uses the wrong encoding when creating a reader!! But that is not fair.
Maybe he knows well what he is doing :) So in that case the reader should make
its way without conversion until the XPathParser.
As a summary:
- XmlConfigBuilder (which gets a user created stream):
- should be able to accept a Reader.
- must never convert it to an IS (so ReaderInputStream should not be called)
- the Reader constructor should not be deprecated.
- XmlMapperBuilder (which get a mybatis created stream):
- should never accept a Reader
- the Reader constructor should stay deprecated
- although it does not matter, the call to ReaderInputStream can also be removed
As a result ReaderInputStream is not needed at all.
I will provide a patch for this this afternoon.
Original comment by eduardo.macarron
on 5 Jan 2011 at 1:05
- Changed state: New
from mybatis.
There goes the patch. Please have a look at it.
Original comment by eduardo.macarron
on 5 Jan 2011 at 3:02
Attachments:
from mybatis.
Wow, lots of comments on this one... I didn't read them all... but in a
nutshell... I agree with tearing out Readers and the ReaderInputStream... I
remember adding that way back in the ibatis 1.0 days, and I can't believe it's
survived all the way to 3.x. I suppose "how XML is read" fell to the bottom of
my attention filter. :-)
Original comment by [email protected]
on 5 Jan 2011 at 6:58
from mybatis.
Ok, have a look at r3564. We should change the user manual to encourage users
to provide InputStreams to SqlSessionFactoryBuilder.
Original comment by eduardo.macarron
on 5 Jan 2011 at 7:33
from mybatis.
Original comment by eduardo.macarron
on 5 Jan 2011 at 8:17
- Changed state: Fixed
from mybatis.
Related Issues (20)
- [mybatis-spring] Mapper scanning should not rely on proprietary @Mapper annotation HOT 1
- Migrations archive unzips to "ibatis-migrations" instead of "mybatis-migrations" HOT 2
- OSGI Support HOT 4
- Remove mapper order restriction when referencing SQL fragment in another file. HOT 40
- Non-existent object id is incorrectly resolved when there is an entry with the same short name. HOT 13
- Allow applying @Transactional to types and interfaces and add a @NoTransaction attribute HOT 4
- Please prioritize issues HOT 1
- Adjust issue's status HOT 1
- Unable to get auto-generated ID with mysql using annotations HOT 1
- Mybatis-spring doesn't propagate exceptions up the stack when using batch mode HOT 17
- how to free temp tablespace allocated for a select stmt return parameter HOT 1
- rollback on BatchExecutor executes batched statements HOT 8
- resultMap doesn't support mapping of unnamed columns by index HOT 2
- mybatis3.0.3 bug in WebService ?? HOT 2
- Annotation-based and AOP-based sessions HOT 3
- Global parameter typeHandler not invoked in case of null values HOT 2
- Exception: Result Maps collection does not contain value for test.User-result HOT 2
- Handling of update counts before result sets in select HOT 8
- configure behavior when all columns values are null for some rows HOT 5
- queryForMap does not use a column as map HOT 14
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mybatis.