Comments (6)
Sounds reasonable. I'll make it optional in the next release.
BTW, I found that recent versions of pip (>7.0) cache built libraries like lxml. Hence, it only build a package the first time. Subsequent installs are almost instant. You might want to try it in the meantime.
from prov.
I am also okay with this. A better solution would be to just replace lxml
with the built-in Python ElementTree implementation: https://docs.python.org/3.5/library/xml.etree.elementtree.html#module-xml.etree.ElementTree
It should be fairly straightforward and the test suite is comprehensive enough to assure it works correctly. There might some issues with unicode - that's why I usually prefer lxml, but again - it should be possible to make due with the Python stdlib version.
It might be a while before I have enough time to give this a try so please feel free to attempt it yourself!
from prov.
I might attempt it myself in the near future. In that case I would probably try to import lxml and fall back on ETree if that is not present, as lxml is faster and more feature complete.
The only big annoyance of ElementTree is that it lacks support for pretty printing IIRC, but that can be fixed by a slow but easy step: feeding the generated xml into xml.minidom which supports pretty printing (see http://stackoverflow.com/questions/749796/pretty-printing-xml-in-python). I am note sure that, although it works, it is a good methods. Do you guys know a better way of the top of your head?
from prov.
Do you guys know a better way of the top of your head?
No I am not aware of anything better. If you default to lxml
and fallback to the built-in element tree then one has proper pretty printing if really necessary (and lxml
installed) - otherwise the minidom trick is good enough.
That sounds like a really good solution to all problems in this issue if documented correctly :-)
from prov.
I had a quick look at the code, there are three other problems with using the xml.etree.ElementTree
(1) it does not support comments (they just disappear), (2) in the code xpath/getparent is used to remove the comments, which is lxml
, and (3) it seems to screw up name spaces if you don't handle them yourself.
The most naive test to compare lxml.etree
and xml.etree.ElementTree
I did already showed some of the problems:
# version using lxml
>>> from lxml import etree
>>> tree = etree.parse('prov/tests/xml/example_41.xml')
>>> etree.tostring(tree.getroot())
'<prov:document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:prov="http://www.w3.org/ns/prov#" xmlns:ex="http://example.com/ns/ex#">\n\n <!-- prov statements go here -->\n\n</prov:document>'
# version using ElementTree
>>> from xml.etree import ElementTree as etree
>>> tree = etree.parse('prov/tests/xml/example_41.xml')
>>> etree.tostring(tree.getroot())
'<ns0:document xmlns:ns0="http://www.w3.org/ns/prov#">\n\n \n\n</ns0:document>'
I will look if I can add support for parsing with comments and namespaces without getting having to write a huge xml module in the end. Otherwise it would probably be better to stick with lxml :-) I think the xpath-based comment removal would not be too hard to do by traversing the tree, but I am worried about the poor comments and namespace support.
from prov.
The dependency lxml
package is now installed with a wheel file as default, which is pretty fast. I guess this is no longer an issue (i.e. building lxml from source).
from prov.
Related Issues (20)
- Sphinx docs don't contain prov_to_graph HOT 1
- missing label property in ProvBundle class HOT 3
- More example usages HOT 2
- prefixes from external context HOT 1
- pydot would not work until I installed graphviz on macOS HOT 2
- Raise ValueError for empty string namespace prefix HOT 3
- No type hints for ProvDocument.agent etc. HOT 1
- ProvDocument.set_default_namespace doesn't add to namespaces list HOT 1
- JSON deserialisation/serialisation: objects are duplicated HOT 2
- New ProvRecord objects aren't added to bundles HOT 2
- RecursionError when using .serialize
- Newbie question on how to work with thematic ontologies HOT 3
- Add prov_doc to jena datastore? HOT 1
- serialization takes a long time HOT 3
- failing test when building HOT 1
- Adding a rdfs:label or description to Prov:Activity HOT 3
- ProvRecord.get_attribute behaviour differs from its doc string
- Add convenient assertion methods for revision, quotation, primary source, mention & influence records
- XML deserialization with prov as default namespace
- Numerous tests fail with `_pytest.outcomes.XFailed` and other issues HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from prov.