zazuko / clownface Goto Github PK

View Code? Open in Web Editor NEW

36.0 11.0 8.0 494 KB

Simple but powerful graph traversing library for RDF

Home Page: https://zazuko.github.io/clownface/

JavaScript 100.00%

gremlin rdf sparql graph-traversal linked-data graph-traversing-library

clownface's Issues

Create term with the right xsd datatype when calling .literal()

When .literal() is called with a numberor boolean the right xsd datatype should be used for the literal.

Fix graph support

The factory doesn't provide the functionality to give a graph to the Clownface object. Also only a single graph should be supported until there is a concept for multiple graph support.

Cannot delete specific objects

It is only possible to cf.deleteOut(predicates) which will remove all triples.

The methods should be extended to take a second param with the object(s) to delete

cf.deleteOut(predicate, objects)

list() does not always return an iterator

The function description of list() says that it always returns an iterator. In reality, it does not:

 if (this.term) {
      if (this.term.termType !== 'NamedNode' && this.term.termType !== 'BlankNode') {
        return null
      }

      if (!this.term.equals(this.namespace.nil) && !this.out(this.namespace.first).term) {
        return null
      }
    }

This causes issues in e.g. the shacl validator where a spread operator is applied to the list() result (see here)

Depending on the structure of your data, this leads to obscure errors such as listNode.list is not a function or its return value is not iterable

ptr.deleteList does not return when trying to delete something that's not a list.

ptr.deleteList does not return when trying to delete something that's not a list.

Example:

const parse = require('./support/parse.js')
const rdf = require('./support/factory')

const data = `
<http://buggy> <https://cube.link/view/argument> "2019-01-01T23:00:00.000Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
`

async function main () {
  const cf = await parse(data)
  const ptr = cf.node(rdf.namedNode('http://buggy'))
  ptr.deleteList(rdf.namedNode('https://cube.link/view/argument'))
}

main()

Perhaps some checking is required?

Cannot easily create "false"^^xsd:boolean literal

Calling cf.literal(false) does not return a new clownface instance

Does clownface understand the abbriviation `a` instead of `rdf:type`?

TLDR: A simple yes or no answer on the title will satisfy me.

Currently having an issue where I get a text/turtle response where I get something like this:

@prefix schema: <http://schema.org/> .

<https://some.iri.com/for-example-1>
    a            schema:Dataset;
    schema:name  "Nice example name" .

<https://some.iri.com/for-example-2>
    a            schema:Dataset;
    schema:name  "Other example name" .

However understanding from examples:

I see the data used from https://github.com/zazuko/tbbt-ld/
Where the use of a in a text/turle is used for rdf:type:

https://github.com/zazuko/tbbt-ld/blob/master/data/person/amy-farrah-fowler.ttl

So I might be not using the following of clownface incorrect in the project

data.has(Ns.rdf.Type, Ns.schema.Dataset)

Now our simplified code (I do not expect you to look at this)

import namespace, { NamespaceBuilder } from '@rdfjs/namespace'
import defaultFormats from '@rdfjs/formats-common'
import fetch from '@rdfjs/fetch'
import rdfExt from 'rdf-ext'
import DatasetExt from 'rdf-ext/lib/Dataset'
import clownface from 'clownface'

class Ns {
  static schema: NamespaceBuilder = namespace('http://schema.org/')
  static rdf: NamespaceBuilder = namespace(
    'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
  )
}

const RdfClientGet = async (
  url: string,
  params?: URLSearchParams,
  contentType = 'text/turtle',
): Promise<DatasetExt> => {
  let newUrl = url
  if (params) newUrl = url + '?' + params
  const response = await fetch(newUrl, {
    method: 'get',
    headers: {
      Accept: contentType,
      Authorization: 'Bearer ' + getToken(),
    },
  })

  if (response?.status == 404) Router.replace('404')

  const format = response.headers.get('content-type') as string
  const parser = defaultFormats.parsers.get(format)
  const dataset = rdfExt.dataset()
  if (response.body) {
    const stream = parser?.import(response.body as any)
    stream?.on('data', (quad) => dataset.add(quad))
    const streamPromise = new Promise((resolve, reject) => {
      stream?.on('end', () => resolve(stream.read))
      stream?.on('error', () => reject)
    })
    await streamPromise
  }

  if (response.status > 299) {
    throw new BadRequest(response, dataset)
  }
  return dataset
}

const getData = () => {
  RdfClientGet('https://example.com/')
    .catch((error) => {
      // custom error stuff
    })
    .then((value) => {
      if (value) {
        const data = clownface({ dataset: value })

        const quads = data.has(Ns.rdf.Type, Ns.schema.Dataset)

        quads.forEach((quad) => {
          const source = {
            iri: quad.term.value,
            name: {
              en: quad.out(Ns.schema('name'), { language: 'en' }).value,
              nl: quad.out(Ns.schema('name'), { language: 'nl' }).value,
            },
          }

          console.log(source)
          // Expected:
          // {
          //   iri: https://some.iri.com/for-example-1 // or -2
          //   name: {
          //     en: 'Nice example name' // or 'Other example name'
          //   }
          // }
          // Result:
          // Never able to loop over quads
          //
        })
      }
    })
}

Return this on forEach()

For each should return the clownface object instance from which it was invoked.

https://github.com/rdf-ext/clownface/blob/master/lib/Clownface.js#L95

Check where @rdfjs/term-set could be used and update code

The inArray function looks like a good candidate to replace the existing logic with the TermSet class. This could improve performance and also reduces complexity of the code in clownface.

Use named arguments in the factory and constructor

Changing the interface of the factory and constructor to use named arguments would allow to clone objects very easy. Also compatible objects like Node of the rdf-path library could be converted very easy that way.

async travelsals

https://gitter.im/rdfjs/public?at=5c98ef108126720abc3fac93

Based on chat conversation it looks that v1.0 will not have async interface any more. I would like to use clownface with a Store which uses Triple Pattern Fragments behind the scene so it will need async traversal.

Rename?

If people search for "clownface," they're going to get "A deranged serial killer known as "Clownface" terrorises the residents of a small town. killer clownmaskmasked killerb horrorslasher."

How about this for a name: graphlim

Add Symbol.iterator to Clownface class

This could be used similarly like toArray() but also in for..of loops and to do direct destructuring

const people = cf({ dataset }).has(rdf.type, schema.Person)

for (const person of people) {
 ...
}

Add option to provide custom factory for terms

Currently all terms are generated using the @rdfjs/data-model factory. It should be possible to use a custom factory. The factory should be given as option to the constructor and the class instance keeps the factory and uses only the given factory to create Term instances.

Allow creating empty NamedNodes

The namedNode method is causing problems when it's called with an empty string. The example below doesn't create a triple:

const resource = clownface({ dataset })
resource.addOut(ns.rdfs.label, resource.namedNode(''))

Add .graph and .graphs property

It should be possible to read the graph from the context. A readable .graph property like .term should be added. To handle cases with more then one graph also a .graphs property should be added.

Add method(s) to filter the results based on termType

There should be one or multiple methods to filter the results based on the termType. This can be done already with the .filter method, but there should be a shorter, more readable option to handle such cases. The filter code for a NamedNode would look like this:

cf.filter(cf => termType === 'NamedNode')

Support recursion

Without context, addOut creates blank subjects

Same would likely apply to addIn and the created objects

const cf = require("clownface")
const { dataset } = require('rdf-ext')
const { rdf, schema } = require('@tpluscode/rdf-ns-builders')

const graph = cf({ dataset: dataset() })

graph.addOut(rdf.type, schema.Person)

;[...graph.dataset][0].subject

Current

The code above creates an undefined subject in that quad

Expected

Initially I thought that maybe this should be a silent no-op but then maybe it would just generate hard to track errors down the line for consumers.

Should it just throw?

Improve list iteration

I was just handling an RDF List and I notice that it returns only an iterator of pointers. I would like to propose to extend it with additional getters which would likewise iterate terms and and values, similarly to how a pointer itself has the .terms and .values properties.

This will allow for simpler usage, like when doing spread and map. For example, trying to get all unique terms from a list

import TermSet from '@rdfjs/term-set'

let listPointer

-const set = new TermSet([...excludedPointer.list()].map(({ term }) => term))
+const set = new TermSet([...excludedPointer.list().terms])

Support language tags in `out`

We would like to add an optional parameter to narrow down the object returned by ptr.out() to specific language(s).

The new signage might look like below, accepting a single language tag string or array thereof

out(term: NamedNode, options: { language?: string | string[] } = { })

Examples

Given RDF like

ex:ananas a ex:Fruit ;
  rdfs:label "Pineapple" ;
  rdfs:label "Ananas"@pl ;
  rdfs:label "Ananas"@de ;
  rdfs:label "Ananász"@hu ;
  rdfs:label "Ananas"@sr-Latn ;
  rdfs:label "Ананас"@sr-Cyrl ;
  rdfs:label _:foo .

ex:apple a ex:Fruit ;
  rdfs:label "Apple"@en ;
  rdfs:label "Apfel"@de ;
  rdfs:label "Јабука"@sr-Cyrl ;.

ex:eggplant a ex:Vegetable ;
  rdfs:label "Psianka podłużna"@pl, "Bakłażan"@pl, "Oberżyna"@pl .

blank nodes and named nodes are never returned when language is used

To only get plain string (no language)

// "Pineapple"
ananas.out(rdfs.label, { language: '' })

// also only "Pineapple" because apple has only langStrings
fruit.out(rdfs.label, { language: '' })

To get only a specific language

// "Ananas"@de
ananas.out(rdfs.label, { language: 'de' })

// [ "Ananas"@de, "Apfel"@de ]
fruit.out(rdfs.label, { language: 'de' })

// [ "Apple"@en ]
// no pineapple because there is not @en label
fruit.out(rdfs.label, { language: 'en' })

// return empty (no matching language)
ananas.out(rdfs.label, { language: 'fr' })

To get one of multiple languages

An array can be used. For every input node, the languages are evaluated in their order in the array

// "Apple"@en
// only one language returned
apple.out(rdfs.label, { language: [ 'en', 'de' ] })

For multiple input nodes, only one language returned per node

// [ "Ananász"@hu, "Apfel"@de ]
// Hungarian for pineaple but German for apple
fruit.out(rdfs.label, { language: [ 'hu', 'de' ] })

To get any language

A wildcard (or undefined) can be used to select any language

// No Hungarian for apple
// An unspecified language will be selected (en or de)
apple.out(rdfs.label, { language: [ 'hu', '*' ] })

Support for secondary language

Exact match can be used all the same

// "Ананас"@sr-Cyrl 
ananas.out(rdfs.label, { language: [ 'sr-Cyrl' ] })

A primary language will match an arbitrary secondary tag. Below sr-Latn was not found and sr would match sr-Cyrl (but an exact match would come first still)

// "Јабука"@sr-Cyrl
apple.out(rdfs.label, { language: [ 'sr-Latn', 'sr' ] })

Multiple values for a language

All will be returned

// [ "Psianka podłużna"@pl, "Bakłażan"@pl, "Oberżyna"@pl ]
eggplant.out(`rdfs.label`, { language: pl })

Empty strings are mapped to undefined

When passing empty string to create a node with clownface, undefined is returned which causes hard to detect errors down the line.

The problem lies on this line: https://github.com/rdf-ext/clownface/blob/master/lib/term.js#L53-L55

I notice that the term is actually relying on how it handles null values and undefined as return value. Thus, I'd propose a change as follows so that empty string or zeros are not ignored.

-  if (!value) {
+  if (value === null || typeof value === 'undefined') {

add .literal and .namedNode methods

There should be a simpler way to create CF objects then calling .node with type, datatype or language parameters, if a Named Node context or Literals context with a specific datatype or language should be created. For this use case a little bit more explicit API would be useful. For Named Nodes it would be simply:

.namedNode(string iri)

for a Literal:

.literal(string value, string|NamedNode|ClownFace languageOrDatatype)

As there should be no : in the language string, the usage of the languageOrDatatype parameter can be detected automatically.

Is this possible in CF?

From @l00mi on April 7, 2015 20:11

var fragmentsClient = new ldf.FragmentsClient('http://fragments.dbpedia.org/2014/en');
var ldfquery = 'CONSTRUCT { <'+source+'> ?p ?o. ?o http://www.w3.org/2000/01/rdf-schema#label ?l} WHERE {<'+source+'> ?p ?o. OPTIONAL {?o http://www.w3.org/2000/01/rdf-schema#label ?l.}} LIMIT 100000';

Copied from original issue: rdf-ext/rdf-ext#24

Add getters to check if there is any context

As we discussed, two simple properties might be handy:

const areThereAny = clownface.out(prop).any
const areThereNo = clownface.out(prop).none

better handling of non-list or empty list object values in .list()

With the current code the following example creates an array with undefined as the only element.

const rdf = require('rdf-ext')
const clownface = require('.')

const subject = rdf.namedNode('http://example.org/subject')
const predicate = rdf.namedNode('http://example.org/predicate')
const ptr = clownface({ dataset: rdf.dataset(), term: subject })
const list = ptr.list(predicate)
console.log([...list].map(e => e.term))

The return value should be null for this case.

Here a full list of edge cases and the expected return values:

empty list

<subject> <predicate> ().

[]

non list object

<subject> <predicate> "test".

null

no matching triple

# no triple

null

The documentation should be extended to show how a non-list object can be combined with .out:

const list = shape.list(ns.sh.path)
const values = (list && [...list]) || [shape.out(ns.sh.path)]

Feature request: Provide some immutable update methods

Thanks for the great lib! I'm using it with react and it works fairly seamlessly after some figuring out. One nice to have would be some immutable update methods similar to the query methods. I.e. instead of adding to a dataset or removing it would return a new instance of clownface with a cloned version of the dataset with the changes made to it. This would be used in setState and useReducer then and it would make it less likely to silently update the dataset. At the moment I'm just running the updates and then calling setState to trigger and update along with some memoization to minimize component updates.

.out with language not returning expected literals

Consider a resource with only tagged labels, with primary tgs

<> rdfs:label "foo"@en , "le foo"@fr , "das Foo"@de .

What should be the output of .out(rdfs.label, { language: [ 'en-US' ] }) ?

Right now it returns nothing but I think that a more specific language should implicitly be followed by less specific ones. So for example a filtered out by a tertiary tag

ptr.out(rdfs.label, { language: [ 'de-DE-1990', 'en' ] })

should be equivalent to

ptr.out(rdfs.label, { language: [ 'de-DE-1990', 'de-DE', 'de', 'en' ] })

Add .distinct method to reduce result to unique terms/dataset/graph

After traversing a graph, many times there are duplicates. Sometimes this is wanted (e.g. counting matching pathes), so it can't be the default, but many times only the unique set is required.

A .distinct method should be added that returns a new object with a reduced context with unique terms/dataset/graph.