Would it be possible to make the lxml package requirement optional, similar to the pyd

lxml as optional dependency instead of requirement about prov HOT 6 CLOSED

trungdong commented on July 23, 2024

lxml as optional dependency instead of requirement

from prov.

Comments (6)

trungdong commented on July 23, 2024

Sounds reasonable. I'll make it optional in the next release.

BTW, I found that recent versions of pip (>7.0) cache built libraries like lxml. Hence, it only build a package the first time. Subsequent installs are almost instant. You might want to try it in the meantime.

from prov.

krischer commented on July 23, 2024

I am also okay with this. A better solution would be to just replace lxml with the built-in Python ElementTree implementation: https://docs.python.org/3.5/library/xml.etree.elementtree.html#module-xml.etree.ElementTree

It should be fairly straightforward and the test suite is comprehensive enough to assure it works correctly. There might some issues with unicode - that's why I usually prefer lxml, but again - it should be possible to make due with the Python stdlib version.

It might be a while before I have enough time to give this a try so please feel free to attempt it yourself!

from prov.

hachterberg commented on July 23, 2024

I might attempt it myself in the near future. In that case I would probably try to import lxml and fall back on ETree if that is not present, as lxml is faster and more feature complete.

The only big annoyance of ElementTree is that it lacks support for pretty printing IIRC, but that can be fixed by a slow but easy step: feeding the generated xml into xml.minidom which supports pretty printing (see http://stackoverflow.com/questions/749796/pretty-printing-xml-in-python). I am note sure that, although it works, it is a good methods. Do you guys know a better way of the top of your head?

from prov.

krischer commented on July 23, 2024

Do you guys know a better way of the top of your head?

No I am not aware of anything better. If you default to lxml and fallback to the built-in element tree then one has proper pretty printing if really necessary (and lxml installed) - otherwise the minidom trick is good enough.

That sounds like a really good solution to all problems in this issue if documented correctly :-)

from prov.

hachterberg commented on July 23, 2024

I had a quick look at the code, there are three other problems with using the xml.etree.ElementTree (1) it does not support comments (they just disappear), (2) in the code xpath/getparent is used to remove the comments, which is lxml, and (3) it seems to screw up name spaces if you don't handle them yourself.

The most naive test to compare lxml.etree and xml.etree.ElementTree I did already showed some of the problems:

# version using lxml
>>> from lxml import etree
>>> tree = etree.parse('prov/tests/xml/example_41.xml')
>>> etree.tostring(tree.getroot())
'<prov:document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:prov="http://www.w3.org/ns/prov#" xmlns:ex="http://example.com/ns/ex#">\n\n  <!-- prov statements go here -->\n\n</prov:document>'
# version using ElementTree
>>> from xml.etree import ElementTree as etree
>>> tree = etree.parse('prov/tests/xml/example_41.xml')
>>> etree.tostring(tree.getroot())
'<ns0:document xmlns:ns0="http://www.w3.org/ns/prov#">\n\n  \n\n</ns0:document>'

I will look if I can add support for parsing with comments and namespaces without getting having to write a huge xml module in the end. Otherwise it would probably be better to stick with lxml :-) I think the xpath-based comment removal would not be too hard to do by traversing the tree, but I am worried about the poor comments and namespace support.

from prov.

trungdong commented on July 23, 2024

The dependency lxml package is now installed with a wheel file as default, which is pretty fast. I guess this is no longer an issue (i.e. building lxml from source).

from prov.

lxml as optional dependency instead of requirement about prov HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent