Giter VIP home page Giter VIP logo

Comments (11)

rbouckaert avatar rbouckaert commented on June 30, 2024 1

Weirdly enough, this does not seem to be a problem with Linux or Mac, only Windows. I updated configuration files so BEAST and BEAUti should recognise UTF-8 under Windows at the next release.

from beast2.

tgvaughan avatar tgvaughan commented on June 30, 2024

Hi @jjmccollum you're not missing anything, UTF-8-encoded files are permitted. Can you please provide a minimal example XML which produces the error? Thanks.

from beast2.

jjmccollum avatar jjmccollum commented on June 30, 2024

@tgvaughan Apologies that it took me so long to come up with one! It turns out that very little is needed to reproduce the error:

<?xml version='1.0' encoding='UTF-8' standalone='no'?>
<beast version="2.7" namespace="beast.pkgmgmt:beast.base.core:beast.base.inference:beast.base.evolution:beast.base.evolution.alignment:beast.base.evolution.datatype:beast.base.evolution.tree:beast.base.evolution.tree.coalescent:beast.base.evolution.branchratemodel:beast.base.inference.util:beast.evolution.nuc:beast.base.evolution.operator:beast.base.inference.operator:beast.base.evolution.sitemodel:beast.base.evolution.substitutionmodel:beast.base.evolution.likelihood">
  <data spec="Alignment" id="alignment" dataType="standard" statecount="2">
    <!-- Remove the following line, and the SAXParseException will not be thrown: -->
    <charstatelabels spec="UserDataType" characterName="B10K1V18U2-12" codeMap="0=0, 1=1, ?=0 1" states="2" value="πεφωτισμενους_τους_οφθαλμους_της_καρδιας_υμων, πεφωτισμενους_τους_οφθαλμους_της_καρδιας"/>
  </data>
</beast>

(Granted, this example will fail for other reasons once you get past the SAXParseException, but the parsing is my only concern here.) I was able to isolate the line that was triggering the SAXParseException; it seems to be something in the value attribute of the charstatelabels element. Curiously, the exception still gets thrown if the charstatelabels element is commented out.

from beast2.

tgvaughan avatar tgvaughan commented on June 30, 2024

Thanks. I've tried to reproduce this using your file, but instead I get an error because of the lack of a element:

Error 102 parsing the xml input file

Expected run element in file

Error detected about here:
  <beast>

Thus it seems that, in my case at least, the XML itself is being parsed without problem.

Perhaps this is somehow an environment/locale problem?

from beast2.

jjmccollum avatar jjmccollum commented on June 30, 2024

It sounds like it must be! For what it's worth, I'm running BEAST v2.7.3 in the BEAST.exe GUI on Windows 10, using the download for Windows at https://www.beast2.org/.

from beast2.

tgvaughan avatar tgvaughan commented on June 30, 2024

Perhaps the file you're using is not actually using utf-8 encoding, but rather something like iso-8859-1? From your error, it looks as though the XML parser is trying to interpret the input as utf-8 but failing.

If you're unsure, can you perhaps attach the actual file you're using? (Copying your code block above resulted in utf-8 on my end, but this may have happened automatically.)

from beast2.

jjmccollum avatar jjmccollum commented on June 30, 2024

It's a possibility. I checked for this before I posted this issue because I've made that mistake before, but VS Code and Notepad both are telling me that the encoding is UTF-8. Still, I can't think of what else it could be, so I've attached the file here. (I had to zip it because GitHub doesn't allow XML attachments.)
beast_unicode_mwe.zip

from beast2.

tgvaughan avatar tgvaughan commented on June 30, 2024

Hrm, yes it appears the encoding is indeed utf-8. How mysterious! I don't have any further suggestions at this point, besides doubling down on my feeling that this has to do with locale somehow. Sorry!

from beast2.

jjmccollum avatar jjmccollum commented on June 30, 2024

It took me a while to confirm it, but your suspicion was correct: for some reason (which I still don't know), my JVM did not have UTF-8 as the default file encoding. I had to enter the command

set JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8

and then run BEAST.exe from the command line, and this time I got to the missing element error message you got.

I can now close this issue. But I do have to ask: is UTF-8 file encoding the default setting in Java for most people? And if not, is there a simple way to set this on Windows so I don't have to do it on the command line every time? (I suppose adding JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8 as an environment variable would work, but I'm wondering if there's an alternate way to configure the default settings of the JVM.)

from beast2.

tgvaughan avatar tgvaughan commented on June 30, 2024

Also weirdly, the error reads as if it is a UTF-8 decoding problem (attempting to decode a non-utf-8-encoded file as if it were utf-8). I guess the strings were read from the file assuming some other encoding, then the XML parser - assuming utf-8-encoded input - failed because its input (in memory) was not valid utf-8?

from beast2.

rbouckaert avatar rbouckaert commented on June 30, 2024

But only under Windows, so it must have had something to do with the way the OS loads files into memory then. Anyway, I hope adeb99c fixed the problem for the future.

from beast2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.