Giter VIP home page Giter VIP logo

node-rdfa's Introduction

ECMAScript RDFa Parser

A fairly modular, cross-platform, extensible package to extract RDF information from RDFa-enabled documents.

Features:

  • Sometimes extracts RDF statements
  • Passes all compliant XML tests
  • Walks DOM tree to generate an RDF graph
  • Extensible design for parsing supersets of RDFa and supporting new media types

Goals:

  • Consume DOM nodes or SAX events
  • Output a complete graph or "Statement" events
  • Support custom types of nodes (e.g. variables)
  • Query for DOM nodes by data they contain
  • Linting of RDFa documents for:
    • check property order
    • reverse/forward incompatability
    • shadowed definitions
    • defining over registered URI schemes
    • non-normalized IRI references
    • Invalid IRIs
    • Invalid CURIEs
    • Term is used without an active vocabulary

Features

Parse a DOM tree into an RDF graph

const rdfa = require('rdfa');
const fs = require('fs');
const DOMParser = require('xmldom').DOMParser;

const filepath = './test/rdfa.github.io/test-suite/test-cases/rdfa1.1-lite/xhtml1/0021.xhtml';
const document = new DOMParser().parseFromString(fs.readFileSync(filepath, 'UTF-8'), 'text/xml');
const result = rdfa.parseDOM(rdfa.RDFaXMLParser, 'http://example.com/', document);
result.outputGraph.forEach(function(n){ console.log(n.toTurtle()); });
<http://example.com/> <http://purl.org/dc/elements/1.1/creator> "Mark Birbeck" .

Support for standards-based XML parsers

There is no included document parser, instead supports reading from a DOM (W3C or compatible), or a SAX-like event stream.

Tests are written for the following packages:

Support for different media types

Supports plain XML, HTML, and SVG out-of-the-box, and is extendible for new host languages.

Supported host languages:

  • RDFaXMLParser - for plain XML documents
  • RDFaXHTMLParser - for HTML documents delivered in application/xhtml+xml
  • RDFaHTMLParser - for HTML documents delivered in text/html

File index

  • bin/httpd.js - an HTTP server for use with the RDFa Test Suite
  • bin/rdfa-turtle.js - read a file and print extracted triples
  • defaults.json - list of RDFa Initial Context vocabulary & terms
  • index.js - the bulk of the logic
  • test/suite.js - runs the library against a local copy of the RDFa test suite

API

parseDOM(Processor, base, document, options)

Extracts RDF statements out of a DOM document document, assuming a URI base base (always ignoring the URI in the DOM node, if any).

Returns an RDFaParser instance (see below).

  • Processor: a reference to a subclass of RDFaParser
  • base: the URI base for the document (where the document was downloaded from)
  • document: DOM document
  • options: object with additional configuration
    • forceVersion: RDFa version to use. This is normally detected.
    • defaultLanguage: default language for when no language is specified by the document (use the language specified in the Content-Language header, if any)

RDFaParser

Maintains state during processing of a document. Pass a subclass of this to parseDOM as the first argument Processor.

RDFaXMLParser

RDFaParser with the default RDF context loaded.

RDFaXHTMLParser

RDFaParser extended with the parsing rules for application/xhtml+xml documents.

RDFaHTMLParser

RDFaParser extended with the parsing rules for text/html documents.

RDFaContext

Represents an RDFa processing context that's generated during processing of an element and passed to child elements.

node-rdfa's People

Contributors

awwright avatar csarven avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

csarven

node-rdfa's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.