Comments (11)
Weirdly enough, this does not seem to be a problem with Linux or Mac, only Windows. I updated configuration files so BEAST and BEAUti should recognise UTF-8 under Windows at the next release.
from beast2.
Hi @jjmccollum you're not missing anything, UTF-8-encoded files are permitted. Can you please provide a minimal example XML which produces the error? Thanks.
from beast2.
@tgvaughan Apologies that it took me so long to come up with one! It turns out that very little is needed to reproduce the error:
<?xml version='1.0' encoding='UTF-8' standalone='no'?>
<beast version="2.7" namespace="beast.pkgmgmt:beast.base.core:beast.base.inference:beast.base.evolution:beast.base.evolution.alignment:beast.base.evolution.datatype:beast.base.evolution.tree:beast.base.evolution.tree.coalescent:beast.base.evolution.branchratemodel:beast.base.inference.util:beast.evolution.nuc:beast.base.evolution.operator:beast.base.inference.operator:beast.base.evolution.sitemodel:beast.base.evolution.substitutionmodel:beast.base.evolution.likelihood">
<data spec="Alignment" id="alignment" dataType="standard" statecount="2">
<!-- Remove the following line, and the SAXParseException will not be thrown: -->
<charstatelabels spec="UserDataType" characterName="B10K1V18U2-12" codeMap="0=0, 1=1, ?=0 1" states="2" value="πεφωτισμενους_τους_οφθαλμους_της_καρδιας_υμων, πεφωτισμενους_τους_οφθαλμους_της_καρδιας"/>
</data>
</beast>
(Granted, this example will fail for other reasons once you get past the SAXParseException
, but the parsing is my only concern here.) I was able to isolate the line that was triggering the SAXParseException
; it seems to be something in the value
attribute of the charstatelabels
element. Curiously, the exception still gets thrown if the charstatelabels
element is commented out.
from beast2.
Thanks. I've tried to reproduce this using your file, but instead I get an error because of the lack of a element:
Error 102 parsing the xml input file
Expected run element in file
Error detected about here:
<beast>
Thus it seems that, in my case at least, the XML itself is being parsed without problem.
Perhaps this is somehow an environment/locale problem?
from beast2.
It sounds like it must be! For what it's worth, I'm running BEAST v2.7.3 in the BEAST.exe GUI on Windows 10, using the download for Windows at https://www.beast2.org/.
from beast2.
Perhaps the file you're using is not actually using utf-8 encoding, but rather something like iso-8859-1? From your error, it looks as though the XML parser is trying to interpret the input as utf-8 but failing.
If you're unsure, can you perhaps attach the actual file you're using? (Copying your code block above resulted in utf-8 on my end, but this may have happened automatically.)
from beast2.
It's a possibility. I checked for this before I posted this issue because I've made that mistake before, but VS Code and Notepad both are telling me that the encoding is UTF-8. Still, I can't think of what else it could be, so I've attached the file here. (I had to zip it because GitHub doesn't allow XML attachments.)
beast_unicode_mwe.zip
from beast2.
Hrm, yes it appears the encoding is indeed utf-8. How mysterious! I don't have any further suggestions at this point, besides doubling down on my feeling that this has to do with locale somehow. Sorry!
from beast2.
It took me a while to confirm it, but your suspicion was correct: for some reason (which I still don't know), my JVM did not have UTF-8 as the default file encoding. I had to enter the command
set JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8
and then run BEAST.exe from the command line, and this time I got to the missing element error message you got.
I can now close this issue. But I do have to ask: is UTF-8 file encoding the default setting in Java for most people? And if not, is there a simple way to set this on Windows so I don't have to do it on the command line every time? (I suppose adding JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8
as an environment variable would work, but I'm wondering if there's an alternate way to configure the default settings of the JVM.)
from beast2.
Also weirdly, the error reads as if it is a UTF-8 decoding problem (attempting to decode a non-utf-8-encoded file as if it were utf-8). I guess the strings were read from the file assuming some other encoding, then the XML parser - assuming utf-8-encoded input - failed because its input (in memory) was not valid utf-8?
from beast2.
But only under Windows, so it must have had something to do with the way the OS loads files into memory then. Anyway, I hope adeb99c fixed the problem for the future.
from beast2.
Related Issues (20)
- Solving ambiguous assertion in InputForAnnotatedConstructorTest HOT 12
- BEAUTi 2 fails to launch apps HOT 3
- need getter in XMLParser
- BEAST does not launch with many packages installed HOT 2
- Loading XML back into BEAUti seems to break things HOT 4
- XMLProducer fails to populate "required" field
- Prep for v2.7.3 release HOT 1
- Sample from Prior : one or more traces contain invalid values and are not able to be displayed HOT 1
- ThreadedTreeLikelihood should have rootFrequencies input HOT 1
- beast2 only usable wich OpenCL HOT 4
- PackageManager -dir does not alter installation directory HOT 6
- command line packagemanager issue with fresh install
- ConstantSitesAlignment HOT 2
- TreeParser does not seem to correctly initialise m_storedNodes
- Variable parameters in rootFrequencies and parameter combinations in substModel HOT 9
- Sequence.initProbabilities assumes all sites have the same number of states HOT 2
- PackageManager should install BEAST.base if not present
- java.lang.IllegalArgumentException: n must be positive HOT 1
- beast 2 package guide out of date HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from beast2.