trungdong / prov Goto Github PK

View Code? Open in Web Editor NEW

118.0 118.0 44.0 2.59 MB

A Python library for W3C Provenance Data Model (PROV)

Home Page: http://prov.readthedocs.io/

License: MIT License

Makefile 0.41% Python 99.59%

prov prov-dm prov-json prov-o provenance provenance-model python w3c

prov's People

Stargazers

Watchers

prov's Issues

Manpages for scripts

Traditionally, all Unix programs are documented with manual pages and Linux distribution often recommends providing corresponding manpages for each installed command-line utility.

It would be nice to improve the prov package so it generates the respective manpage for each script. For instance, this could be achieved by integration of a pair of custom build and install setuptools commands responsible for calling help2man on the scripts.

Cheers,

Don't use int constants (for attributes, etc)

This does not make sense to me when reading in a PROV-JSON file:

>>> bundle.get_records(ProvActivity)[0].get_attributes()[0].keys()
[101, 102]

How am I meant to know what 101 and 102 are? Look up among the hardcoded constants?

I don't see what purpose these integer constants serve (perhaps this is a leftover from a C-version of the code?), so I would get rid of them and simply use normal strings like "startTime" or "prov:startTime".

spurious files (dev/, some prov/serializers/) in released pypi tarballs

While reviewing a Debian package for 1.5.0 release I have noted that there are some files in the tarball (fetched from pypi) which aren't within this git repository for 1.5.0 release/tag.

diff -Naur --exclude=.git prov 1.5.0-1/python-prov-1.5.0 | diffstat | grep -v -e 'debian/' -e '\.pc/' -e 'egg-info' -e PKG-I | grep '\+'
 dev/debug_jsons.py                   |   58 ++
 dev/script.py                        |   99 +++
 dev/test_count_JSON-LD.py            |   38 +
 dev/test_count_Streaming.py          |   82 +++
 prov/serializers/jsonld_streaming.py |  209 +++++++
 prov/serializers/provjsons.py        |  368 ++++++++++++++

my guess is that release tarball was generated on a development box in a non-clean git repository so those files got picked up as well... just wanted to let you know.

(since git clones are so cheap, paranoid me usually quickly clones the repo for running 'sdist' command ;) )

assertEqual of identical bundles containing any prov.model.Literal attributes will fail

Failing test:

# coding: utf8
import unittest
from prov.model import *
import datetime

class FailingTest(unittest.TestCase):
    def setUp(self):
        pass

    def test_inserting_arbitrary_attributes(self):
            g = ProvBundle()

            attributes = {
                          'ex:int': 100,
                          'ex:float': 100.123456,
                          'ex:str': 'Some string',
                          'ex:unicode': u'Some unicode string with accents: Huỳnh Trung Đông',
                          'ex:timedate': datetime.datetime(2012, 12, 12, 14, 7, 48),
                          'ex:literal': prov.Literal("hi", datatype="xsd:notImplemented")
                          }

            g.entity('ex:req3', attributes)

            h = ProvBundle()

            attributes = {
                          'ex:int': 100,
                          'ex:float': 100.123456,
                          'ex:str': 'Some string',
                          'ex:unicode': u'Some unicode string with accents: Huỳnh Trung Đông',
                          'ex:timedate': datetime.datetime(2012, 12, 12, 14, 7, 48),
                          'ex:literal': prov.Literal("hi", datatype="xsd:notImplemented")
                          }

            h.entity('ex:req3', attributes)

            self.assertEqual(h, g)

if __name__ == '__main__':
    unittest.main()

Feature Request: capability to add new record to an existing bundle or add new attributes to existing record.

Let's say at time t0, I have a bundle with an entity E1 and an activity A1.
activity A1 has a prov:startTime.

Later, at time t1, I want to add an new entity E2 to that bundle and I want to add prov:endTime to the activity A1.

Since I am not going to write to full provenance data in one shot, I would like to use something like "save_bundle" but that calls for instance:
PDRecord.objects.get_or_create instead of PDRecord.objects.create
and
PDBundle.objects.get_or_create instead of PDBundle.objects.create

difference between online translator and this library

using the following example as input to this library and the online translator there is a difference in how qnames are handled. is there a definitive answer? (i would prefer what the online translator did)

https://github.com/trungdong/prov/blob/master/prov/tests/json/activity6.json

the main difference is how the element with type xsd:QName is handled.

Prov library

document
  prefix ex <http://example.org/>

  activity(ex:a6, -, -, [prov:location=1, prov:location="ex:london" %% xsd:QName, 
 prov:location="London", prov:location="1.0" %% xsd:float, 
 prov:location=2014-06-23T12:28:53.858000+01:00,
 prov:location="2002" %% xsd:gYear, prov:label="activity6"])
endDocument

Online translator

document
prefix xsd <http://www.w3.org/2001/XMLSchema>
prefix ex <http://example.org/>
activity(ex:a6,-,-,[prov:label = "activity6", prov:location = "London" %% xsd:string, prov:location = "1" %%
 xsd:int, prov:location = "1.0" %% xsd:float, prov:location = "true" %% xsd:boolean, prov:location =
 'ex:london', prov:location = "2014-06-23T12:28:53.858+01:00" %% xsd:dateTime, prov:location =
 "http://example.org/london" %% xsd:anyURI, prov:location = "2002" %% xsd:gYear])
endDocument

bundle example for branch 1.x

@trungdong - is there an example for bundle that prints out a bundle statement? for example could you write an example that generates the following provn representation?

document
bundle agg:bundle3
  entity(ex:report1, [ prov:type="report", ex:version=1 ])
  wasGeneratedBy(ex:report1, -, 2012-05-24T10:00:01)

  entity(ex:report2, [ prov:type="report", ex:version=2 ])
  wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01)
  wasDerivedFrom(ex:report2, ex:report1)
endBundle

entity(agg:bundle3, [ prov:type='prov:Bundle' ])
agent(ex:aggregator01, [ prov:type='ex:Aggregator' ])
wasAttributedTo(agg:bundle3, ex:aggregator01)
wasDerivedFrom(agg:bundle3, bob:bundle1)
wasDerivedFrom(agg:bundle3, alice:bundle2)
endDocument

jsonld through rdflib vs on its own

@trungdong - is there a reason why you created a separate jsonld branch instead of trying it through the rdf branch?

TypeError while retrieving ProvDocuments from ProvStore using Py3

Hello,

I tried out the Tutorial given at http://nbviewer.ipython.org/github/trungdong/notebooks/blob/master/PROV%20Tutorial.ipynb, everything very nice explained. But I always got exceptions when trying to retrieve ProvDocuments from ProvStore looking like this:

TypeError: initial_value must be str or None, not bytes

This exception was raised by the deserializer from model.py because the content it tried to parse were bytes. I found the origin of the problem in the request call at line 106 r= self._request(...), in api.py from provstore-api module, which returns the content as bytes and that raised the error from the io.StringIO function in the deserializer which accepts only unicode. I simply decoded the content as UTF-8, right before the deserializer is called from the api and everything worked again. I don't know what is responsible for this incompatibility, maybe Windows 7 which I'm using? Anyway please check the compatibility of these modules again.

Best wishes
Bojan

Initial Update

Hi 👊

This is my first visit to this fine repo, but it seems you have been working hard to keep all dependencies updated so far.

Once you have closed this issue, I'll create seperate pull requests for every update as soon as I find one.

That's it for now!

Happy merging! 🤖

Tests with unicode strings failed

Saving literals with unicode string to a database fails with the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u1ef3' in position 180: ordinal not in range(128)

Use prov:QUALIFIED_NAME instead of prov:QualifiedName

prov:QualifiedName is not defined by any of the PROV documents, while PROV-N specifies that prov:QUALIFIED_NAME is the type for qualified name literals:

The non terminals STRING_LITERAL, INT_LITERAL, and QUALIFIED_NAME_LITERAL are syntactic sugar for quoted strings with datatype xsd:string, xsd:int, and prov:QUALIFIED_NAME respectively.

PROV-N serialization of wasInfluenced by includes an unnecessary "-"

In the example below the PROV-N serialization of the wasInfluencedBy relation includes a "-".

In [1]: import prov.model as prov
In [2]: g = prov.ProvBundle()
In [3]: g.entity("e1")
Out[3]: <prov.model.ProvEntity at 0x101f2d550>
In [4]: g.entity("e2")
Out[4]: <prov.model.ProvEntity at 0x101f2d810>
In [5]: g.wasInfluencedBy("e1","e2")
Out[5]: <prov.model.ProvInfluence at 0x101f2d9d0>
In [6]: print g
document

  entity(e1)
  entity(e2)
  wasInfluencedBy(e1, e2, -)
endDocument

Based on the PROV-DM examples for this relation at http://www.w3.org/TR/prov-dm/#dfn-wasinfluencedby, it appears that this "-" is unnecessary.

The issue arrises when using the ProvToolbox to convert this PROV-N to RDF.

PROVCONVERT code says:
TestPreProcessSHORT_FilledInJobRECURSIVE.provn line 256:54 mismatched input '-' expecting OPEN_SQUARE_BRACE

If I go through the PROVN file and remove the ,- (comma-dashes) then there is no problem. The same dashes are there for the "USED" function but the PROVCONVERT program has no problem with them.

Is this a bug in the PROV Python library or ProvToolbox?

LiteralAttribute does not support langtag

International strings will lose the langtags as LiteralAttribute currently does not support it.

Expand the graph vertically

Hi, I am just wondering if it supports to expand the exported graph vertically. Currently (maybe default) it expands the graph horizontally. Thanks!

Prefixes (Namespaces) for alphanumeric identifiers not passed to turtle serialisation

Hi @satra, @trungdong,

I am trying to use Namespace to specify prefixes for our alphanumeric identifiers and though this works fine with provn, it can't get it to work with the turtle serialisation. Any help would be greatly appreciated!

Here is a minimal example:

from prov.model import ProvDocument, Namespace, QualifiedName
import urllib.request
import csv

# Get a list of preferred prefixes form online csv file
csv_url = "https://raw.githubusercontent.com/incf-nidash/nidm/master/nidm/nidm-results/terms/prefixes.csv"
prefix_file = urllib.request.urlopen(
    csv_url).read().decode('utf-8').splitlines()
prefixes = dict()
reader = csv.reader(prefix_file)
for alphanum_id, prefix, uri in reader:
    prefixes[uri] = Namespace(prefix, uri)['']

## Example of prefixes dictionnary
# prefixes = dict(
#     [("http://purl.org/nidash/nidm#NIDM_0000170",
#       Namespace('nidm_groupName',
#                 'http://purl.org/nidash/nidm#NIDM_0000170')['']),
#      ("http://purl.org/nidash/nidm#NIDM_0000165",
#       Namespace('nidm_NIDMResultsExporter',
#                 'http://purl.org/nidash/nidm#NIDM_0000165')[''])],
#     )

g = ProvDocument()

group_name_uri = "http://purl.org/nidash/nidm#NIDM_0000170"
ex = Namespace('ex', 'http://example/')

g.entity(ex['group1'], {prefixes[group_name_uri]: "Group 1"})

print(g.serialize(format='provn'))
print("---")
print(g.serialize(format='rdf', rdf_format="turtle"))

and the output:

document
  prefix ex <http://example/>
  prefix nidm_groupName <http://purl.org/nidash/nidm#NIDM_0000170>

  entity(ex:group1, [nidm_groupName:="Group 1"])
endDocument

---
@prefix ex: <http://example/> .
@prefix nidm_groupName: <http://purl.org/nidash/nidm#NIDM_0000170> .
@prefix ns1: <http://purl.org/nidash/nidm#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:group1 a prov:Entity ;
    ns1:NIDM_0000170 "Group 1"^^xsd:string .

I would like ns1:NIDM_0000170 to be replaced by nidm_groupName: in the turtle document.

internal representation does not distinguish between literal with and without datatype

the following are stored internally simply as str, so on re-serialization there is no way to know which to output with or without datatype.

ex:a9 ex:tag1 "hello"^^xsd:string .

and
ex:a9 ex:tag1 "hello" .

Ability to add comments to prov model

Currently there is no way to add a comment to a provenance model

Naive datetime warning when Django time zone support is enabled

Warning by Django 1.5.1:

site-packages/django/db/models/fields/init.py:827: RuntimeWarning: DateTimeField received a naive datetime (2013-07-04 00:00:00) while time zone support is active.

Extra attributes on prov:type attributes

Hi all,

this is not an issue but rather a question and probably a pretty easy one as well...

I am trying to add an XML (de)serializer to this package (will send a PR once ready) and it seems to work fairly well except one little detail that I cannot figure out.

The prov:type PROV attributes on the XML spec page (http://www.w3.org/TR/prov-xml/) usually also have an xsi:type attribute as well, e.g. in example 6:

<prov:document
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:prov="http://www.w3.org/ns/prov#"
    xmlns:ex="http://example.com/ns/ex#"
    xmlns:tr="http://example.com/ns/tr#">

  <prov:entity prov:id="tr:WD-prov-dm-20111215">
    <prov:type xsi:type="xsd:QName">document</prov:type>
    <ex:version>2</ex:version>
  </prov:entity>

</prov:document>

Where does the xsi:type="xsd:QName" attribute go in the prov package's internal data model? The rest is easy but this detail for some reason eludes me.

import prov.model as prov

EX_NS = ('ex', 'http://example.org/ns/ex#')
EX_TR = ('tr', 'http://example.org/ns/tr#')
EX_XSI = ('xsi', 'http://www.w3.org/2001/XMLSchema-instance')

document = prov.ProvDocument()
document.add_namespace(*EX_NS)
document.add_namespace(*EX_TR)
document.add_namespace(*EX_XSI)

document.entity("tr:WD-prov-dm-20111215", (
    (prov.PROV_TYPE, "document"),
    ("ex:version", "2")
))

Thanks!

ProvRecord validate method to assert required attributes

Most of the PROV concepts have at least one attribute required for it to be considered valid. Is there any desire to integrate a method like this?

foaf:name not allowed as entity attribute

Howdy! Thanks for this great library. I'm completely new to WC3 PROV and the prov library and this library has made life quite a bit easier.

I believe the example below should work, but not sure. I'm trying to add the "foaf:name" attribute to an entity, but getting an error that I'm missing a Prov Identifier. "foaf:givenName" works but now "foaf:name". Am I doing something wrong or is this indeed a bug?

doc = prov.ProvDocument()
e1 = iris_doc.entity("ex:entity1", {
        'foaf:name': "Entity One",
    })
---------------------------------------------------------------------------
ProvElementIdentifierRequired             Traceback (most recent call last)
<ipython-input-36-0ce33a77955a> in <module>()
      1 doc = prov.ProvDocument()
      2 e1 = iris_doc.entity("ex:entity1", {
----> 3         'foaf:name': "Entity One",
      4     })

/Users/aterrel/miniconda/envs/boldmetrics/lib/python2.7/site-packages/prov/model.pyc in entity(self, identifier, other_attributes)
   1182 
   1183     def entity(self, identifier, other_attributes=None):
-> 1184         return self.new_record(PROV_ENTITY, identifier, None, other_attributes)
   1185 
   1186     def activity(self, identifier, startTime=None, endTime=None,

/Users/aterrel/miniconda/envs/boldmetrics/lib/python2.7/site-packages/prov/model.pyc in new_record(self, record_type, identifier, attributes, other_attributes)
   1170             )
   1171         new_record = PROV_REC_CLS[record_type](
-> 1172             self, self.valid_qualified_name(identifier), attr_list
   1173         )
   1174         self._add_record(new_record)

/Users/aterrel/miniconda/envs/boldmetrics/lib/python2.7/site-packages/prov/model.pyc in __init__(self, bundle, identifier, attributes)
    458         if identifier is None:
    459             # All types of PROV elements require a valid identifier
--> 460             raise ProvElementIdentifierRequired()
    461 
    462         super(ProvElement, self).__init__(bundle, identifier, attributes)

ProvElementIdentifierRequired: An identifier is missing. All PROV elements require a valid identifier.

Skip tests requiring pydotplus if the latter is not installed

Since pydotplus is considered an optional dependency, the testsuite should be able to skip the tests depending on it, if the latter is not installed.

Convert prov:location to prov:atLocation for RDF

Per conversation with @satra, the RDF branch should test that a prov:location in prov-dm should be converted to prov:atLocation.

Also, if the value of prov:location is a URI, and the URI exists as a subject in document, do we infer that the URI is also a prov:Location (class) when converting to RDF?

Constants module is name "contants"

Literal comparison for Decimal values

@trungdong in prov/model.py the way literal comparison is implemented, it results in:

pm.Literal(10, datatype=pm.XSD_DECIMAL) != pm.Literal(10.0, datatype=pm.XSD_DECIMAL)

is that intended?

Failed to install index for persistence.PDRecord model on MySQL backend.

When using a MySQL backend (Version 14.14 Distrib 5.5.29 for debian-linux-gnu (x86_64)), I got the following error at database creation:
Failed to install index for persistence.PDNamespace model: (1170, "BLOB/TEXT column 'uri' used in key specification without a key length")
Failed to install index for persistence.PDRecord model: (1170, "BLOB/TEXT column 'rec_id' used in key specification without a key length")

how to run unit tests?

@trungdong - how do you run unit tests for prov?

accessing formal attributes changes underlying graph

following up on the previous issue with None on current master

g = pm.ProvDocument()
g.add_namespace(pm.Namespace('ex', 'http://example.org/'))
a1 = g.activity('ex:a1')
ag1 = g.agent('ex:ag1')
as1 = g.wasAssociatedWith(a1, agent=ag1)
print(g.get_provn())
formal_objects = []
for key, val in as1.formal_attributes:
    formal_objects.append(key)
print(g.get_provn())

results in:

document
  prefix ex <http://example.org/>

  activity(ex:a1, -, -)
  agent(ex:ag1)
  wasAssociatedWith(ex:a1, ex:ag1, -)
endDocument
document
  prefix ex <http://example.org/>

  activity(ex:a1, -, -)
  agent(ex:ag1)
  wasAssociatedWith(ex:a1, ex:ag1, None)
endDocument

this happens because accessing the property formal_attributes changes self._attributes. since _attributes is a defaultdict(set) simply accessing an attribute makes it part of it:

g = pm.ProvDocument()
g.add_namespace(pm.Namespace('ex', 'http://example.org/'))
a1 = g.activity('ex:a1')
ag1 = g.agent('ex:ag1')
as1 = g.wasAssociatedWith(a1, agent=ag1)
print(as1._attributes)
print(as1.formal_attributes)
print(as1._attributes)

results in:

defaultdict(<type 'set'>, {<QualifiedName: prov:agent>: set([<QualifiedName: ex:ag1>]), 
<QualifiedName: prov:activity>: set([<QualifiedName: ex:a1>])})
((<QualifiedName: prov:activity>, <QualifiedName: ex:a1>), (<QualifiedName: prov:agent>, 
<QualifiedName: ex:ag1>), (<QualifiedName: prov:plan>, None))
defaultdict(<type 'set'>, {<QualifiedName: prov:agent>: set([<QualifiedName: ex:ag1>]), 
<QualifiedName: prov:plan>: set([]), <QualifiedName: prov:activity>: set([<QualifiedName: ex:a1>])})

attribute order error in branch 1.x

g = pm.ProvDocument()
c = g.collection(':c', other_attributes=[('foo:butter', "c")])
a = g.activity(':a1', startTime="2012-04-03T23:59:59Z")
gg1 = g.generation(c, a)
gg2 = g.generation(c, a, time="2012-04-03T23:59:59Z")

print "relation 1"
for attr, value in gg1.attributes:
    print attr.uri, value.uri

print "relation 2"
for attr, value in gg2.attributes:
    print attr.uri, value

results in:

relation 1
http://www.w3.org/ns/prov#entity :c
http://www.w3.org/ns/prov#activity :a1
relation 2
http://www.w3.org/ns/prov#activity :a1
http://www.w3.org/ns/prov#entity :c
http://www.w3.org/ns/prov#time 2012-04-03 23:59:59+00:00

the order of activity and entity are switched in relation 2

Feature Request: PROV-RDF Serialization

I'd be interested in adding support for PROV-RDF using rdflib. Is this in the roadmap? If so, I could help add this feature with a little guidance about how to extend prov with additional serializations.

PyPI packages lacks test data.

The package on PyPI seems to differs from the packages obtained via GitHub (different checksums). Moreover, it lacks some the test data and causes some tests to fail.

Ideally, both tarballs should be in sync or at least named differently.

QualifiedGeneration for RDF serialisation attribute of Activity instead of Entity

Hi @satra, @trungdong,

I have continued my tests of the RDF serialisation and I think there might be an issue with the prov:qualifiedGeneration.

On the PROV website the prov:qualifiedGeneration is an attribute of a prov:Entity but when I run the RDF export I get it as an attribute of a prov:Activity:

niiri:9e0461b9-f8f3-4ea2-b9be-a215242106b7 a prov:Activity ;
    rdfs:label "NIDM-Results export"^^xsd:string ;
    prov:qualifiedGeneration [ a prov:Generation ;
    prov:atTime "2016-10-12T13:34:38.449691"^^xsd:dateTime ] .

Previously we had:

niiri:53356c68-de61-4d95-a5e5-f3c1625f1979 a nidm_NIDMResults:,
        prov:Bundle,
        prov:Entity ;
        rdfs:label "NIDM-Results" ;
        nidm_version: "1.3.0"^^xsd:string ;
        prov:qualifiedGeneration [ a prov:Generation ;
        prov:activity niiri:f5611270-8992-45ea-a68f-5686ab00b29c ;
        prov:atTime "10:45:52.725385"^^xsd:dateTime ] .

clarification on bundle namespace scoping

@lucmoreau and @trungdong - quick clarification question here:

in the example below, i feel the id (ex:bundle1) should remain in the another.org namespace, even though the content is in example.org. i couldn't find a clear place in the prov-dm document that clarifies scoping rules of prefixes. if you have a pointer that would be great.

document
  prefix foo <http://example.org/>
  prefix ex <http://another.org/>

  entity(foo:bundle1, [prov:type="prov:Bundle" %% xsd:QName])
  entity(ex:bundle2, [prov:type="prov:Bundle" %% xsd:QName])
  bundle ex:bundle2
    prefix ex <http://another.org/>

    used(ex:use2; ex:aa1, ex:ee1, -)
    activity(ex:aa1, -, -)
    entity(ex:ee1)
  endBundle
  bundle ex:bundle1
    prefix ex <http://example.org/>

    used(ex:use1; ex:a1, ex:e1, -)
    activity(ex:a1, -, -)
    entity(ex:e1)
  endBundle
endDocument

-------- Expected RDF from Web service ---------

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ex: <http://another.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foo: <http://example.org/> .

{
    foo:bundle1 a prov:Entity , "prov:Bundle"^^xsd:QName .
    ex:bundle2 a prov:Entity , "prov:Bundle"^^xsd:QName .
}

ex:bundle2 {
    ex:use2 a prov:Usage ;
        prov:entity ex:ee1 .
    ex:aa1 prov:qualifiedUsage ex:use2 ;
        a prov:Activity .
    ex:ee1 a prov:Entity .
}

foo:bundle1 {
    foo:use1 a prov:Usage ;
        prov:entity foo:e1 .
    foo:a1 prov:qualifiedUsage foo:use1 ;
        a prov:Activity .   
    foo:e1 a prov:Entity .
}

Change type, label, and value constant names for consistency

This is an observation, but these constants do not contain the _ATTR part of their names like the other attribute constants.

Add a bit of usage documentation to README

A few examples so folks can see at a glance how to use the library and get an understanding for how it works.

Preservation of record ordering in JSON schema

Out of curiosity, what the reasoning for using an object-based schema rather than an array-based one? The latter would preserve the order of records in the document and in bundles. This is only relevant when [de]serializing to other formats, but the structure is lost when going to the JSON format. I do not presume this changes the semantics or validity of the document since all the records are still present (just in a different order).

Namespaces in RDF serialisation

Hi @satra, @trungdong,

Could the namespaces defined using the prov toolbox be passed to the json-ld (RDF) export?

When exporting the following provn document: https://provenance.ecs.soton.ac.uk/store/documents/114819/
with current master I get a json-ld file where all namespaces are replaced by their URIs (e.g. http://purl.org/nidash/nidm# instead of nidm:): https://github.com/cmaumet/nidmresults-fsl/blob/test_jsonld/test/exported/ex_fsl_default_130/nidm.jsonld (for some reason I can't upload this document on the prov store).

Is there something specific to do for the namespaces to be passed?

Currently I am using:

jsonld_txt = self.doc.serialize(format='rdf', rdf_format='json-ld')
provn_txt = self.doc.serialize(format='provn')

relax version requirement for networkx to be 1.9 or may be even lower?

I was looking into providing backports of Debian package Ghislain Antony Vaillant has prepared for Debian. A culprit forbidding seamless backporting for debian stable/jessie is demand to have networkx >= 1.10 which was introduced in d916dbc . Unfortunately commit message doesn't state why boosting version was required, so I have decided to ask, since package built and all tests passed with 1.9 version of networkx as present on debian jessie.
Or there is some feature which is just not tested which does require 1.10 for sure?

Thank you in advance for the clarification!

string literals with quotation marks

import prov.model as pm
g = pm.ProvDocument()
g.add_namespace('ex', 'http://example.org')
g.entity('ex:hoo', other_attributes={'ex:foo': u'{"make": "test"}'})
print(g.get_provn())

returns

document
  prefix ex <http://example.org>

  entity(ex:hoo, [ex:foo="{"make": "test"}"])
endDocument

this results in a parser error on the web service, because the internal quotes are not escaped.

XML encoding does not retain xsd:QName type

See the XML encoding of prov.tests.qnames.TestQualifiedNamesBase.test_xsd_qnames

Before:
entity(ex:e1, [prov:type="ex1:another_value" %% xsd:QName, prov:value="ex:a_value" %% xsd:QName])

After:
entity(ex:e1, [prov:type='ex1:another_value', prov:value='ex:a_value'])

Handling the default namespace in XML

PROV-XML deserializer does not handle the default namespace in provx files. For example:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<prov:document xmlns:prov="http://www.w3.org/ns/prov#">
    <prov:entity xmlns="http://example.org/0/" prov:id="e001"/>
</prov:document>

should be deserialized as:

document
    default <http://example.org/0/>
    entity(e001)
endDocument

Merging of attributes of the same value but different types

The (extra) attribute-value pairs of a record have set-like behaviour, i.e. entity(ex:a, [ex:v=2, ex:v=2]) is considered equivalent to entity(ex:a, [ex:v=2]). Therefore, attribute-value pairs are stored in a set by prov.model.

However, when adding values to a set, Python considers 2 (int) and 2.0 (float) the same, resulting only one values retained in the set.

return python representation for serialization types

at present the serialization api doesn't allow to return the python representation for the serialized form (i.e, json, xml, rdf). it would be nice to be able to get the internal representation without actually serializing to print form and then parsing it. this would allow an api for conversion.

including other dictionaries/lists as values of the additional attributes

I need to add an attribute like
{ve:parameters:[{"val": "../test-resources/testfiles/stations", "key": "stations_file"}]}
to an activity, in order to support activity specific parameters, which I don't want to treat as entities.

Same applies if I want to add an attribute "annotations" to an entity whose content is typically user defined. Say..
{ve:annotations:[{"val": "0.4", "key": "contribute"}]}

I.e. Annotations can be produced at run-time if specific property of the produced data are recognized.

I have noticed that adding such structured attributes to the 'other_attributes' parameter makes the serialisation fails with

File "/prov/model.py", line 321, in add_attributes
self._attributes[attr].add(value)
TypeError: unhashable type: 'list'

It does make sense to us, because of the characteristic of the provenance data we are producing.

In general I think that the api is not supporting a structure like the one shown in the EXAMPLE3 of
http://www.w3.org/Submission/2013/SUBM-prov-json-20130424/
which should be expressed with something like

g.entity("e1",other_attributes={"ex:values": [{ "$":"1034","type":"xsd:positiveInteger"},2]})

Methods for reading/writing PROV

Even if just PROV-JSON is supported currently for loading or saving PROV, it would be useful to have a method like "prov.model.load()" to read and write those files/strings, rather than the current implementation trick of passing along a JSONDecoder:

bundle = json.load(open("provenance.json"), cls=prov.model.ProvBundle.JSONDecoder)

lxml as optional dependency instead of requirement

Would it be possible to make the lxml package requirement optional, similar to the pydotplus already used?

Currently it is required, but as far as I can see it is only needed for the provxml serializer. We use prov with json and do not need xml support, but requirement of lxml makes the install a lot more difficult and time consuming.

xsd:QName should not be in RDF test files

Some of the .ttl test files added in 1471517 use xsd:QName, which is no longer valid as a type for qualified names.
All such literals should be just prefixed names in Turtle.

py 2.6 failure

in python 2.6 this line fails.

   File "/home/travis/miniconda/envs/testenv/lib/python2.6/site-packages/prov/model.py", line 407
    self._attributes[PROV_ATTR_STARTTIME] = {startTime}

we use prov in our nipype project where travis checks against 2.6 and 2.7 - 2.7 passes but 2.6 fails.

Reference refactoring

prov.model currently uses a strong references to other records in relations. This introduced significant complexity into the handling of missing records.

This issue is to document the refactoring to remove strong references (and inferred records) and to replace them with QNames.

trungdong / prov Goto Github PK

prov's People

Stargazers

Watchers

Forkers

prov's Issues

Prov library

Online translator

Recommend Projects

Recommend Topics

Recommend Org