Giter VIP home page Giter VIP logo

Comments (7)

skript-cc avatar skript-cc commented on May 26, 2024 1

That should work fine in hast I believe!

Works fine in hast indeed but does not work in the parser. I've been told it's invalid according to the RDFa spec:

the attributes in this specification SHOULD be defined in 'no namespace' (e.g., when the attributes are used on elements in the Host Language's namespace, they can be used with no qualifying prefix: <myml:myElement property="license">).

Complex indeed, but perfectly understandable. I'm going to consider this as a limitation of RDFa. Perhaps microdata or microformats are a better fit.

Thank you very much for your insights on this matter!

from hast-util-to-html.

wooorm avatar wooorm commented on May 26, 2024

Thanks! Hmm, complex.
Do you have some more info on what pipeline you have, how are you using all this? What’s that RDF parser later in the chain?

from hast-util-to-html.

wooorm avatar wooorm commented on May 26, 2024

Is there another way to prevent converting the datatype attribute to kebab case?

Maybe, depends on more info on your setup

Would it be possible/desirable to set the space configuration option to a custom schema?

I don’t think hast should allow it. It’s really for HTML. For namespaces and such, you could go from hast to xast (XML AST), and do some work there.

or, is the best option to deal with this in the property-information package after all?

Probably not. It might be possible though, I see React does support it (although, maybe that’s for the SVG one)

from hast-util-to-html.

skript-cc avatar skript-cc commented on May 26, 2024

Thanks for your answer! The hast to xast conversion could be a sufficient workaround; I will explore what I can do with that. I already tried to add a namespace to datatype, this might be the missing piece.

Background and setup

I'm working on a proof of concept to investigate the use of simple semantic annotations in markdown and word processor documents. Something along the line of this idea: http://blog.sparna.fr/2020/02/20/semantic-markdown/. Since both formats are easily converted to HTML (markdown basically is html), I'm using rehype to parse the HTML, convert the annotations to RDFa attributes and stringify it again to an HTML document. This leads to a machine readable document, i.e. a document that's easily processed by tools in the RDF ecosystem.

The role of rehype/unified in this story is: HTML doc with custom annotation strings goes in, HTML document with RDF attributes comes out.

import rdfParser from 'rdf-parse';
import { JsonLdSerializer } from 'jsonld-streaming-serializer';
import jsonld from 'jsonld';
import unified from 'unified';
import stream from 'unified-stream';
import parseHtml  from 'rehype-parse';
import toHtml from 'rehype-stringify';
import annotations from './plugin.js';

const htmlStream = process.stdin.pipe(
  stream(
    unified()
    .use(parseHtml, {emitParseErrors: true, duplicateAttribute: false})
    .use(annotations)
    .use(toHtml)
  )
);
htmlStream.pipe(process.stdout);

Since RDFa is just HTML + some extra attributes, I didn't think I was trying to do something with rehype it isn't designed to do. Everything worked liked a charm, until I bumped into the datatype problem.

Currently I'm using https://github.com/rubensworks/rdf-parse.js to parse the RDFa document (again), which uses https://github.com/rubensworks/rdfa-streaming-parser.js under the hood. The parser emits a stream of quads, which I feed into some other JSON LD tools/serializers, to play with.

const rdfStream = rdfParser.default.parse(
  htmlStream,
  {
    contentType: 'text/html',
    baseIRI: 'https://example.org/',
  }
);

let jsonStr = '';
const context = {
  rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  xsd: "http://www.w3.org/2001/XMLSchema#",
  schema: "http://schema.org/",
  ex: "http://example.org/",
};

rdfStream
  .pipe(new JsonLdSerializer({
    space: '  ',
    context,
  }))
  .on('data', chunk => jsonStr += chunk)
  .on('end', () => {
    const doc = JSON.parse(jsonStr);
    jsonld.compact(doc, context).then(data => {
      console.log('Compacted: ', data)
    });
    jsonld.frame(doc, { ['@context']: context}).then(data => {
      console.log('Framed: ', data)
    });
  });

I know all this parsing and serialization is very inefficient and make do. That's a concern to deal with later.

from hast-util-to-html.

skript-cc avatar skript-cc commented on May 26, 2024

hast-util-to-xast exhibits the same behavior: datatype becomes data-type. The only difference is that xast-util-to-xml does not format data attribute names. This opens up the opportunity to find data-type attributes and change them back to datatype before stringifying. So, that's a workaround at least!

I also tried to namespace datatype (e.g. rdf:datatype), but the rdf parser ignores it. Filed an issue for that too.

from hast-util-to-html.

wooorm avatar wooorm commented on May 26, 2024

This opens up the opportunity to find data-type attributes and change them back to datatype before stringifying. So, that's a workaround at least!

Yep, indeed. hast is for HTML. It has one limitation (or, one less thing to worry about): it supports properties, not attributes. And here it becomes complex because it’s assuming dataType / datatype / data-type are dataset things (similar to data-my-id=123 or so). It typically ignores unknown things, but because data* is arbitrary, it can’t.

datatype is complex, because it’s non-standard:

The specification makes use of the rdf:HTML datatype. This feature is non-normative, because the equality of the literal values depend on DOM4 [dom4], a specification that has not yet reached W3C Recommendation status.
https://www.w3.org/TR/html-rdfa/

^-- but that doc is from 6 years ago, and about 2 years ago, WHATWG HTML is the one HTML: https://www.w3.org/blog/2019/05/w3c-and-whatwg-to-work-together-to-advance-the-open-web-platform/.
As WHATWG HTML doesn’t list datatype, it remains a non-normative attribute.

I also tried to namespace datatype (e.g. rdf:datatype), but the rdf parser ignores it. Filed an issue for that too.

That should work fine in hast I believe!

from hast-util-to-html.

wooorm avatar wooorm commented on May 26, 2024

Microdata is specifically supported in HTML, so that’s probably going to work well. Good luck!

from hast-util-to-html.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.