trungdong / prov Goto Github PK
View Code? Open in Web Editor NEWA Python library for W3C Provenance Data Model (PROV)
Home Page: http://prov.readthedocs.io/
License: MIT License
A Python library for W3C Provenance Data Model (PROV)
Home Page: http://prov.readthedocs.io/
License: MIT License
Traditionally, all Unix programs are documented with manual pages and Linux distribution often recommends providing corresponding manpages for each installed command-line utility.
It would be nice to improve the prov
package so it generates the respective manpage for each script. For instance, this could be achieved by integration of a pair of custom build and install setuptools commands responsible for calling help2man
on the scripts.
Cheers,
This does not make sense to me when reading in a PROV-JSON file:
>>> bundle.get_records(ProvActivity)[0].get_attributes()[0].keys()
[101, 102]
How am I meant to know what 101 and 102 are? Look up among the hardcoded constants?
I don't see what purpose these integer constants serve (perhaps this is a leftover from a C-version of the code?), so I would get rid of them and simply use normal strings like "startTime" or "prov:startTime".
While reviewing a Debian package for 1.5.0 release I have noted that there are some files in the tarball (fetched from pypi) which aren't within this git repository for 1.5.0 release/tag.
diff -Naur --exclude=.git prov 1.5.0-1/python-prov-1.5.0 | diffstat | grep -v -e 'debian/' -e '\.pc/' -e 'egg-info' -e PKG-I | grep '\+'
dev/debug_jsons.py | 58 ++
dev/script.py | 99 +++
dev/test_count_JSON-LD.py | 38 +
dev/test_count_Streaming.py | 82 +++
prov/serializers/jsonld_streaming.py | 209 +++++++
prov/serializers/provjsons.py | 368 ++++++++++++++
my guess is that release tarball was generated on a development box in a non-clean git repository so those files got picked up as well... just wanted to let you know.
(since git clones are so cheap, paranoid me usually quickly clones the repo for running 'sdist' command ;) )
Failing test:
# coding: utf8
import unittest
from prov.model import *
import datetime
class FailingTest(unittest.TestCase):
def setUp(self):
pass
def test_inserting_arbitrary_attributes(self):
g = ProvBundle()
attributes = {
'ex:int': 100,
'ex:float': 100.123456,
'ex:str': 'Some string',
'ex:unicode': u'Some unicode string with accents: Huỳnh Trung Đông',
'ex:timedate': datetime.datetime(2012, 12, 12, 14, 7, 48),
'ex:literal': prov.Literal("hi", datatype="xsd:notImplemented")
}
g.entity('ex:req3', attributes)
h = ProvBundle()
attributes = {
'ex:int': 100,
'ex:float': 100.123456,
'ex:str': 'Some string',
'ex:unicode': u'Some unicode string with accents: Huỳnh Trung Đông',
'ex:timedate': datetime.datetime(2012, 12, 12, 14, 7, 48),
'ex:literal': prov.Literal("hi", datatype="xsd:notImplemented")
}
h.entity('ex:req3', attributes)
self.assertEqual(h, g)
if __name__ == '__main__':
unittest.main()
Let's say at time t0, I have a bundle with an entity E1 and an activity A1.
activity A1 has a prov:startTime.
Later, at time t1, I want to add an new entity E2 to that bundle and I want to add prov:endTime to the activity A1.
Since I am not going to write to full provenance data in one shot, I would like to use something like "save_bundle" but that calls for instance:
PDRecord.objects.get_or_create instead of PDRecord.objects.create
and
PDBundle.objects.get_or_create instead of PDBundle.objects.create
using the following example as input to this library and the online translator there is a difference in how qnames are handled. is there a definitive answer? (i would prefer what the online translator did)
https://github.com/trungdong/prov/blob/master/prov/tests/json/activity6.json
the main difference is how the element with type xsd:QName is handled.
document
prefix ex <http://example.org/>
activity(ex:a6, -, -, [prov:location=1, prov:location="ex:london" %% xsd:QName,
prov:location="London", prov:location="1.0" %% xsd:float,
prov:location=2014-06-23T12:28:53.858000+01:00,
prov:location="2002" %% xsd:gYear, prov:label="activity6"])
endDocument
document
prefix xsd <http://www.w3.org/2001/XMLSchema>
prefix ex <http://example.org/>
activity(ex:a6,-,-,[prov:label = "activity6", prov:location = "London" %% xsd:string, prov:location = "1" %%
xsd:int, prov:location = "1.0" %% xsd:float, prov:location = "true" %% xsd:boolean, prov:location =
'ex:london', prov:location = "2014-06-23T12:28:53.858+01:00" %% xsd:dateTime, prov:location =
"http://example.org/london" %% xsd:anyURI, prov:location = "2002" %% xsd:gYear])
endDocument
@trungdong - is there an example for bundle that prints out a bundle statement? for example could you write an example that generates the following provn representation?
document
bundle agg:bundle3
entity(ex:report1, [ prov:type="report", ex:version=1 ])
wasGeneratedBy(ex:report1, -, 2012-05-24T10:00:01)
entity(ex:report2, [ prov:type="report", ex:version=2 ])
wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01)
wasDerivedFrom(ex:report2, ex:report1)
endBundle
entity(agg:bundle3, [ prov:type='prov:Bundle' ])
agent(ex:aggregator01, [ prov:type='ex:Aggregator' ])
wasAttributedTo(agg:bundle3, ex:aggregator01)
wasDerivedFrom(agg:bundle3, bob:bundle1)
wasDerivedFrom(agg:bundle3, alice:bundle2)
endDocument
@trungdong - is there a reason why you created a separate jsonld branch instead of trying it through the rdf branch?
Hello,
I tried out the Tutorial given at http://nbviewer.ipython.org/github/trungdong/notebooks/blob/master/PROV%20Tutorial.ipynb, everything very nice explained. But I always got exceptions when trying to retrieve ProvDocuments from ProvStore looking like this:
TypeError: initial_value must be str or None, not bytes
This exception was raised by the deserializer from model.py because the content it tried to parse were bytes. I found the origin of the problem in the request call at line 106 r= self._request(...), in api.py from provstore-api module, which returns the content as bytes and that raised the error from the io.StringIO function in the deserializer which accepts only unicode. I simply decoded the content as UTF-8, right before the deserializer is called from the api and everything worked again. I don't know what is responsible for this incompatibility, maybe Windows 7 which I'm using? Anyway please check the compatibility of these modules again.
Best wishes
Bojan
Hi 👊
This is my first visit to this fine repo, but it seems you have been working hard to keep all dependencies updated so far.
Once you have closed this issue, I'll create seperate pull requests for every update as soon as I find one.
That's it for now!
Happy merging! 🤖
Saving literals with unicode string to a database fails with the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u1ef3' in position 180: ordinal not in range(128)
prov:QualifiedName
is not defined by any of the PROV documents, while PROV-N specifies that prov:QUALIFIED_NAME
is the type for qualified name literals:
The non terminals STRING_LITERAL, INT_LITERAL, and QUALIFIED_NAME_LITERAL are syntactic sugar for quoted strings with datatype xsd:string, xsd:int, and prov:QUALIFIED_NAME respectively.
In the example below the PROV-N serialization of the wasInfluencedBy relation includes a "-".
In [1]: import prov.model as prov
In [2]: g = prov.ProvBundle()
In [3]: g.entity("e1")
Out[3]: <prov.model.ProvEntity at 0x101f2d550>
In [4]: g.entity("e2")
Out[4]: <prov.model.ProvEntity at 0x101f2d810>
In [5]: g.wasInfluencedBy("e1","e2")
Out[5]: <prov.model.ProvInfluence at 0x101f2d9d0>
In [6]: print g
document
entity(e1)
entity(e2)
wasInfluencedBy(e1, e2, -)
endDocument
Based on the PROV-DM examples for this relation at http://www.w3.org/TR/prov-dm/#dfn-wasinfluencedby, it appears that this "-" is unnecessary.
The issue arrises when using the ProvToolbox to convert this PROV-N to RDF.
PROVCONVERT code says:
TestPreProcessSHORT_FilledInJobRECURSIVE.provn line 256:54 mismatched input '-' expecting OPEN_SQUARE_BRACE
If I go through the PROVN file and remove the ,- (comma-dashes) then there is no problem. The same dashes are there for the "USED" function but the PROVCONVERT program has no problem with them.
Is this a bug in the PROV Python library or ProvToolbox?
International strings will lose the langtags as LiteralAttribute
currently does not support it.
Hi, I am just wondering if it supports to expand the exported graph vertically. Currently (maybe default) it expands the graph horizontally. Thanks!
Hi @satra, @trungdong,
I am trying to use Namespace
to specify prefixes for our alphanumeric identifiers and though this works fine with provn, it can't get it to work with the turtle serialisation. Any help would be greatly appreciated!
Here is a minimal example:
from prov.model import ProvDocument, Namespace, QualifiedName
import urllib.request
import csv
# Get a list of preferred prefixes form online csv file
csv_url = "https://raw.githubusercontent.com/incf-nidash/nidm/master/nidm/nidm-results/terms/prefixes.csv"
prefix_file = urllib.request.urlopen(
csv_url).read().decode('utf-8').splitlines()
prefixes = dict()
reader = csv.reader(prefix_file)
for alphanum_id, prefix, uri in reader:
prefixes[uri] = Namespace(prefix, uri)['']
## Example of prefixes dictionnary
# prefixes = dict(
# [("http://purl.org/nidash/nidm#NIDM_0000170",
# Namespace('nidm_groupName',
# 'http://purl.org/nidash/nidm#NIDM_0000170')['']),
# ("http://purl.org/nidash/nidm#NIDM_0000165",
# Namespace('nidm_NIDMResultsExporter',
# 'http://purl.org/nidash/nidm#NIDM_0000165')[''])],
# )
g = ProvDocument()
group_name_uri = "http://purl.org/nidash/nidm#NIDM_0000170"
ex = Namespace('ex', 'http://example/')
g.entity(ex['group1'], {prefixes[group_name_uri]: "Group 1"})
print(g.serialize(format='provn'))
print("---")
print(g.serialize(format='rdf', rdf_format="turtle"))
and the output:
document
prefix ex <http://example/>
prefix nidm_groupName <http://purl.org/nidash/nidm#NIDM_0000170>
entity(ex:group1, [nidm_groupName:="Group 1"])
endDocument
---
@prefix ex: <http://example/> .
@prefix nidm_groupName: <http://purl.org/nidash/nidm#NIDM_0000170> .
@prefix ns1: <http://purl.org/nidash/nidm#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:group1 a prov:Entity ;
ns1:NIDM_0000170 "Group 1"^^xsd:string .
I would like ns1:NIDM_0000170
to be replaced by nidm_groupName:
in the turtle document.
the following are stored internally simply as str, so on re-serialization there is no way to know which to output with or without datatype.
ex:a9 ex:tag1 "hello"^^xsd:string .
and
ex:a9 ex:tag1 "hello" .
Currently there is no way to add a comment to a provenance model
Warning by Django 1.5.1:
site-packages/django/db/models/fields/init.py:827: RuntimeWarning: DateTimeField received a naive datetime (2013-07-04 00:00:00) while time zone support is active.
Hi all,
this is not an issue but rather a question and probably a pretty easy one as well...
I am trying to add an XML (de)serializer to this package (will send a PR once ready) and it seems to work fairly well except one little detail that I cannot figure out.
The prov:type
PROV attributes on the XML spec page (http://www.w3.org/TR/prov-xml/) usually also have an xsi:type
attribute as well, e.g. in example 6:
<prov:document
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:prov="http://www.w3.org/ns/prov#"
xmlns:ex="http://example.com/ns/ex#"
xmlns:tr="http://example.com/ns/tr#">
<prov:entity prov:id="tr:WD-prov-dm-20111215">
<prov:type xsi:type="xsd:QName">document</prov:type>
<ex:version>2</ex:version>
</prov:entity>
</prov:document>
Where does the xsi:type="xsd:QName"
attribute go in the prov package's internal data model? The rest is easy but this detail for some reason eludes me.
import prov.model as prov
EX_NS = ('ex', 'http://example.org/ns/ex#')
EX_TR = ('tr', 'http://example.org/ns/tr#')
EX_XSI = ('xsi', 'http://www.w3.org/2001/XMLSchema-instance')
document = prov.ProvDocument()
document.add_namespace(*EX_NS)
document.add_namespace(*EX_TR)
document.add_namespace(*EX_XSI)
document.entity("tr:WD-prov-dm-20111215", (
(prov.PROV_TYPE, "document"),
("ex:version", "2")
))
Thanks!
Most of the PROV concepts have at least one attribute required for it to be considered valid. Is there any desire to integrate a method like this?
Howdy! Thanks for this great library. I'm completely new to WC3 PROV and the prov library and this library has made life quite a bit easier.
I believe the example below should work, but not sure. I'm trying to add the "foaf:name" attribute to an entity, but getting an error that I'm missing a Prov Identifier. "foaf:givenName" works but now "foaf:name". Am I doing something wrong or is this indeed a bug?
doc = prov.ProvDocument()
e1 = iris_doc.entity("ex:entity1", {
'foaf:name': "Entity One",
})
---------------------------------------------------------------------------
ProvElementIdentifierRequired Traceback (most recent call last)
<ipython-input-36-0ce33a77955a> in <module>()
1 doc = prov.ProvDocument()
2 e1 = iris_doc.entity("ex:entity1", {
----> 3 'foaf:name': "Entity One",
4 })
/Users/aterrel/miniconda/envs/boldmetrics/lib/python2.7/site-packages/prov/model.pyc in entity(self, identifier, other_attributes)
1182
1183 def entity(self, identifier, other_attributes=None):
-> 1184 return self.new_record(PROV_ENTITY, identifier, None, other_attributes)
1185
1186 def activity(self, identifier, startTime=None, endTime=None,
/Users/aterrel/miniconda/envs/boldmetrics/lib/python2.7/site-packages/prov/model.pyc in new_record(self, record_type, identifier, attributes, other_attributes)
1170 )
1171 new_record = PROV_REC_CLS[record_type](
-> 1172 self, self.valid_qualified_name(identifier), attr_list
1173 )
1174 self._add_record(new_record)
/Users/aterrel/miniconda/envs/boldmetrics/lib/python2.7/site-packages/prov/model.pyc in __init__(self, bundle, identifier, attributes)
458 if identifier is None:
459 # All types of PROV elements require a valid identifier
--> 460 raise ProvElementIdentifierRequired()
461
462 super(ProvElement, self).__init__(bundle, identifier, attributes)
ProvElementIdentifierRequired: An identifier is missing. All PROV elements require a valid identifier.
Since pydotplus
is considered an optional dependency, the testsuite should be able to skip the tests depending on it, if the latter is not installed.
Per conversation with @satra, the RDF branch should test that a prov:location in prov-dm should be converted to prov:atLocation.
Also, if the value of prov:location is a URI, and the URI exists as a subject in document, do we infer that the URI is also a prov:Location (class) when converting to RDF?
@trungdong in prov/model.py the way literal comparison is implemented, it results in:
pm.Literal(10, datatype=pm.XSD_DECIMAL) != pm.Literal(10.0, datatype=pm.XSD_DECIMAL)
is that intended?
When using a MySQL backend (Version 14.14 Distrib 5.5.29 for debian-linux-gnu (x86_64)), I got the following error at database creation:
Failed to install index for persistence.PDNamespace model: (1170, "BLOB/TEXT column 'uri' used in key specification without a key length")
Failed to install index for persistence.PDRecord model: (1170, "BLOB/TEXT column 'rec_id' used in key specification without a key length")
There is a related question on SO :
http://stackoverflow.com/questions/11129309/how-do-i-specify-an-index-for-a-textfield-in-django-with-a-mysql-backend
and
http://stackoverflow.com/questions/1827063/mysql-error-key-specification-without-a-key-length
@trungdong - how do you run unit tests for prov?
following up on the previous issue with None
on current master
g = pm.ProvDocument()
g.add_namespace(pm.Namespace('ex', 'http://example.org/'))
a1 = g.activity('ex:a1')
ag1 = g.agent('ex:ag1')
as1 = g.wasAssociatedWith(a1, agent=ag1)
print(g.get_provn())
formal_objects = []
for key, val in as1.formal_attributes:
formal_objects.append(key)
print(g.get_provn())
results in:
document
prefix ex <http://example.org/>
activity(ex:a1, -, -)
agent(ex:ag1)
wasAssociatedWith(ex:a1, ex:ag1, -)
endDocument
document
prefix ex <http://example.org/>
activity(ex:a1, -, -)
agent(ex:ag1)
wasAssociatedWith(ex:a1, ex:ag1, None)
endDocument
this happens because accessing the property formal_attributes changes self._attributes. since _attributes is a defaultdict(set) simply accessing an attribute makes it part of it:
g = pm.ProvDocument()
g.add_namespace(pm.Namespace('ex', 'http://example.org/'))
a1 = g.activity('ex:a1')
ag1 = g.agent('ex:ag1')
as1 = g.wasAssociatedWith(a1, agent=ag1)
print(as1._attributes)
print(as1.formal_attributes)
print(as1._attributes)
results in:
defaultdict(<type 'set'>, {<QualifiedName: prov:agent>: set([<QualifiedName: ex:ag1>]),
<QualifiedName: prov:activity>: set([<QualifiedName: ex:a1>])})
((<QualifiedName: prov:activity>, <QualifiedName: ex:a1>), (<QualifiedName: prov:agent>,
<QualifiedName: ex:ag1>), (<QualifiedName: prov:plan>, None))
defaultdict(<type 'set'>, {<QualifiedName: prov:agent>: set([<QualifiedName: ex:ag1>]),
<QualifiedName: prov:plan>: set([]), <QualifiedName: prov:activity>: set([<QualifiedName: ex:a1>])})
g = pm.ProvDocument()
c = g.collection(':c', other_attributes=[('foo:butter', "c")])
a = g.activity(':a1', startTime="2012-04-03T23:59:59Z")
gg1 = g.generation(c, a)
gg2 = g.generation(c, a, time="2012-04-03T23:59:59Z")
print "relation 1"
for attr, value in gg1.attributes:
print attr.uri, value.uri
print "relation 2"
for attr, value in gg2.attributes:
print attr.uri, value
results in:
relation 1
http://www.w3.org/ns/prov#entity :c
http://www.w3.org/ns/prov#activity :a1
relation 2
http://www.w3.org/ns/prov#activity :a1
http://www.w3.org/ns/prov#entity :c
http://www.w3.org/ns/prov#time 2012-04-03 23:59:59+00:00
the order of activity and entity are switched in relation 2
I'd be interested in adding support for PROV-RDF using rdflib. Is this in the roadmap? If so, I could help add this feature with a little guidance about how to extend prov with additional serializations.
The package on PyPI seems to differs from the packages obtained via GitHub (different checksums). Moreover, it lacks some the test data and causes some tests to fail.
Ideally, both tarballs should be in sync or at least named differently.
Hi @satra, @trungdong,
I have continued my tests of the RDF serialisation and I think there might be an issue with the prov:qualifiedGeneration
.
On the PROV website the prov:qualifiedGeneration
is an attribute of a prov:Entity
but when I run the RDF export I get it as an attribute of a prov:Activity
:
niiri:9e0461b9-f8f3-4ea2-b9be-a215242106b7 a prov:Activity ;
rdfs:label "NIDM-Results export"^^xsd:string ;
prov:qualifiedGeneration [ a prov:Generation ;
prov:atTime "2016-10-12T13:34:38.449691"^^xsd:dateTime ] .
Previously we had:
niiri:53356c68-de61-4d95-a5e5-f3c1625f1979 a nidm_NIDMResults:,
prov:Bundle,
prov:Entity ;
rdfs:label "NIDM-Results" ;
nidm_version: "1.3.0"^^xsd:string ;
prov:qualifiedGeneration [ a prov:Generation ;
prov:activity niiri:f5611270-8992-45ea-a68f-5686ab00b29c ;
prov:atTime "10:45:52.725385"^^xsd:dateTime ] .
@lucmoreau and @trungdong - quick clarification question here:
in the example below, i feel the id (ex:bundle1
) should remain in the another.org
namespace, even though the content is in example.org
. i couldn't find a clear place in the prov-dm document that clarifies scoping rules of prefixes. if you have a pointer that would be great.
document
prefix foo <http://example.org/>
prefix ex <http://another.org/>
entity(foo:bundle1, [prov:type="prov:Bundle" %% xsd:QName])
entity(ex:bundle2, [prov:type="prov:Bundle" %% xsd:QName])
bundle ex:bundle2
prefix ex <http://another.org/>
used(ex:use2; ex:aa1, ex:ee1, -)
activity(ex:aa1, -, -)
entity(ex:ee1)
endBundle
bundle ex:bundle1
prefix ex <http://example.org/>
used(ex:use1; ex:a1, ex:e1, -)
activity(ex:a1, -, -)
entity(ex:e1)
endBundle
endDocument
-------- Expected RDF from Web service ---------
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ex: <http://another.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foo: <http://example.org/> .
{
foo:bundle1 a prov:Entity , "prov:Bundle"^^xsd:QName .
ex:bundle2 a prov:Entity , "prov:Bundle"^^xsd:QName .
}
ex:bundle2 {
ex:use2 a prov:Usage ;
prov:entity ex:ee1 .
ex:aa1 prov:qualifiedUsage ex:use2 ;
a prov:Activity .
ex:ee1 a prov:Entity .
}
foo:bundle1 {
foo:use1 a prov:Usage ;
prov:entity foo:e1 .
foo:a1 prov:qualifiedUsage foo:use1 ;
a prov:Activity .
foo:e1 a prov:Entity .
}
This is an observation, but these constants do not contain the _ATTR
part of their names like the other attribute constants.
A few examples so folks can see at a glance how to use the library and get an understanding for how it works.
Out of curiosity, what the reasoning for using an object-based schema rather than an array-based one? The latter would preserve the order of records in the document and in bundles. This is only relevant when [de]serializing to other formats, but the structure is lost when going to the JSON format. I do not presume this changes the semantics or validity of the document since all the records are still present (just in a different order).
Hi @satra, @trungdong,
Could the namespaces defined using the prov toolbox be passed to the json-ld (RDF) export?
When exporting the following provn document: https://provenance.ecs.soton.ac.uk/store/documents/114819/
with current master
I get a json-ld file where all namespaces are replaced by their URIs (e.g. http://purl.org/nidash/nidm#
instead of nidm:
): https://github.com/cmaumet/nidmresults-fsl/blob/test_jsonld/test/exported/ex_fsl_default_130/nidm.jsonld (for some reason I can't upload this document on the prov store).
Is there something specific to do for the namespaces to be passed?
Currently I am using:
jsonld_txt = self.doc.serialize(format='rdf', rdf_format='json-ld')
provn_txt = self.doc.serialize(format='provn')
I was looking into providing backports of Debian package Ghislain Antony Vaillant has prepared for Debian. A culprit forbidding seamless backporting for debian stable/jessie is demand to have networkx >= 1.10 which was introduced in d916dbc . Unfortunately commit message doesn't state why boosting version was required, so I have decided to ask, since package built and all tests passed with 1.9 version of networkx as present on debian jessie.
Or there is some feature which is just not tested which does require 1.10 for sure?
Thank you in advance for the clarification!
import prov.model as pm
g = pm.ProvDocument()
g.add_namespace('ex', 'http://example.org')
g.entity('ex:hoo', other_attributes={'ex:foo': u'{"make": "test"}'})
print(g.get_provn())
returns
document
prefix ex <http://example.org>
entity(ex:hoo, [ex:foo="{"make": "test"}"])
endDocument
this results in a parser error on the web service, because the internal quotes are not escaped.
See the XML encoding of prov.tests.qnames.TestQualifiedNamesBase.test_xsd_qnames
Before:
entity(ex:e1, [prov:type="ex1:another_value" %% xsd:QName, prov:value="ex:a_value" %% xsd:QName])
After:
entity(ex:e1, [prov:type='ex1:another_value', prov:value='ex:a_value'])
PROV-XML deserializer does not handle the default namespace in provx files. For example:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<prov:document xmlns:prov="http://www.w3.org/ns/prov#">
<prov:entity xmlns="http://example.org/0/" prov:id="e001"/>
</prov:document>
should be deserialized as:
document
default <http://example.org/0/>
entity(e001)
endDocument
The (extra) attribute-value pairs of a record have set-like behaviour, i.e. entity(ex:a, [ex:v=2, ex:v=2])
is considered equivalent to entity(ex:a, [ex:v=2])
. Therefore, attribute-value pairs are stored in a set by prov.model.
However, when adding values to a set, Python considers 2 (int) and 2.0 (float) the same, resulting only one values retained in the set.
at present the serialization api doesn't allow to return the python representation for the serialized form (i.e, json, xml, rdf). it would be nice to be able to get the internal representation without actually serializing to print form and then parsing it. this would allow an api for conversion.
I need to add an attribute like
{ve:parameters:[{"val": "../test-resources/testfiles/stations", "key": "stations_file"}]}
to an activity, in order to support activity specific parameters, which I don't want to treat as entities.
Same applies if I want to add an attribute "annotations" to an entity whose content is typically user defined. Say..
{ve:annotations:[{"val": "0.4", "key": "contribute"}]}
I.e. Annotations can be produced at run-time if specific property of the produced data are recognized.
I have noticed that adding such structured attributes to the 'other_attributes' parameter makes the serialisation fails with
File "/prov/model.py", line 321, in add_attributes
self._attributes[attr].add(value)
TypeError: unhashable type: 'list'
It does make sense to us, because of the characteristic of the provenance data we are producing.
In general I think that the api is not supporting a structure like the one shown in the EXAMPLE3 of
http://www.w3.org/Submission/2013/SUBM-prov-json-20130424/
which should be expressed with something like
g.entity("e1",other_attributes={"ex:values": [{ "$":"1034","type":"xsd:positiveInteger"},2]})
Even if just PROV-JSON is supported currently for loading or saving PROV, it would be useful to have a method like "prov.model.load()" to read and write those files/strings, rather than the current implementation trick of passing along a JSONDecoder:
bundle = json.load(open("provenance.json"), cls=prov.model.ProvBundle.JSONDecoder)
Would it be possible to make the lxml package requirement optional, similar to the pydotplus already used?
Currently it is required, but as far as I can see it is only needed for the provxml serializer. We use prov with json and do not need xml support, but requirement of lxml makes the install a lot more difficult and time consuming.
Some of the .ttl
test files added in 1471517 use xsd:QName
, which is no longer valid as a type for qualified names.
All such literals should be just prefixed names in Turtle.
in python 2.6 this line fails.
File "/home/travis/miniconda/envs/testenv/lib/python2.6/site-packages/prov/model.py", line 407
self._attributes[PROV_ATTR_STARTTIME] = {startTime}
we use prov in our nipype project where travis checks against 2.6 and 2.7 - 2.7 passes but 2.6 fails.
prov.model currently uses a strong references to other records in relations. This introduced significant complexity into the handling of missing records.
This issue is to document the refactoring to remove strong references (and inferred records) and to replace them with QNames.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.