Giter VIP home page Giter VIP logo

Comments (6)

trungdong avatar trungdong commented on July 23, 2024

Sounds reasonable. I'll make it optional in the next release.

BTW, I found that recent versions of pip (>7.0) cache built libraries like lxml. Hence, it only build a package the first time. Subsequent installs are almost instant. You might want to try it in the meantime.

from prov.

krischer avatar krischer commented on July 23, 2024

I am also okay with this. A better solution would be to just replace lxml with the built-in Python ElementTree implementation: https://docs.python.org/3.5/library/xml.etree.elementtree.html#module-xml.etree.ElementTree

It should be fairly straightforward and the test suite is comprehensive enough to assure it works correctly. There might some issues with unicode - that's why I usually prefer lxml, but again - it should be possible to make due with the Python stdlib version.

It might be a while before I have enough time to give this a try so please feel free to attempt it yourself!

from prov.

hachterberg avatar hachterberg commented on July 23, 2024

I might attempt it myself in the near future. In that case I would probably try to import lxml and fall back on ETree if that is not present, as lxml is faster and more feature complete.

The only big annoyance of ElementTree is that it lacks support for pretty printing IIRC, but that can be fixed by a slow but easy step: feeding the generated xml into xml.minidom which supports pretty printing (see http://stackoverflow.com/questions/749796/pretty-printing-xml-in-python). I am note sure that, although it works, it is a good methods. Do you guys know a better way of the top of your head?

from prov.

krischer avatar krischer commented on July 23, 2024

Do you guys know a better way of the top of your head?

No I am not aware of anything better. If you default to lxml and fallback to the built-in element tree then one has proper pretty printing if really necessary (and lxml installed) - otherwise the minidom trick is good enough.

That sounds like a really good solution to all problems in this issue if documented correctly :-)

from prov.

hachterberg avatar hachterberg commented on July 23, 2024

I had a quick look at the code, there are three other problems with using the xml.etree.ElementTree (1) it does not support comments (they just disappear), (2) in the code xpath/getparent is used to remove the comments, which is lxml, and (3) it seems to screw up name spaces if you don't handle them yourself.

The most naive test to compare lxml.etree and xml.etree.ElementTree I did already showed some of the problems:

# version using lxml
>>> from lxml import etree
>>> tree = etree.parse('prov/tests/xml/example_41.xml')
>>> etree.tostring(tree.getroot())
'<prov:document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:prov="http://www.w3.org/ns/prov#" xmlns:ex="http://example.com/ns/ex#">\n\n  <!-- prov statements go here -->\n\n</prov:document>'
# version using ElementTree
>>> from xml.etree import ElementTree as etree
>>> tree = etree.parse('prov/tests/xml/example_41.xml')
>>> etree.tostring(tree.getroot())
'<ns0:document xmlns:ns0="http://www.w3.org/ns/prov#">\n\n  \n\n</ns0:document>'

I will look if I can add support for parsing with comments and namespaces without getting having to write a huge xml module in the end. Otherwise it would probably be better to stick with lxml :-) I think the xpath-based comment removal would not be too hard to do by traversing the tree, but I am worried about the poor comments and namespace support.

from prov.

trungdong avatar trungdong commented on July 23, 2024

The dependency lxml package is now installed with a wheel file as default, which is pretty fast. I guess this is no longer an issue (i.e. building lxml from source).

from prov.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.