glennjones / microformat-node Goto Github PK

Microformats parser for node.js

Home Page: http://glennjones.net/tools/microformats/

License: MIT License

JavaScript 96.38% CSS 1.76% Makefile 0.01% HTML 1.85%

microformat-node's Introduction

microformat-node

A node.js microformats parser. It is the same codebase as microformat-shiv project, but used the fast HTML DOM cheerio to parse HTML.

Installation

$ npm i microformat-node

Methods

Parsing
- get
Discovery

get

The get method parses microformats data from either a html string or a cheerio object.

Simple parse of HTML string.

    var Microformats = require('microformat-node'),
        options = {};

    options.html = '<a class="h-card" href="http://glennjones.net">Glenn</a>';

    Microformats.get(options, function(err, data){
        // do something with data
    });

Simple parse of a Cheerio parsed page

    var Microformats = require('microformat-node'),
        Cheerio = require('cheerio'),
        options = {};

    options.node = Cheerio.load('<a class="h-card" href="http://glennjones.net">Glenn</a>');
    Microformats.get(options, function(err, data){
        // do something with data
    });

Options

html - (String) the html to be parse
node - (Cheerio DOM object) the element to be parse
filter - (Array) microformats types returned - i.e. ['h-card'] - always adds rels
baseUrl - (String) a base URL to resolve any relative URL:s to
textFormat - (String) text style whitespacetrimmed or normalised default is whitespacetrimmed
dateFormat - (String) the ISO date profile auto, microformat2, w3c rfc3339 or html5 default is auto
add - (Array) adds microformat version 1 definitions

I would recommended always setting textFormat option to normalised. This is not part of the microformat parsing rules, but in most cases provides more usable output.

Experimental Options

These options are part of ongoing specification development. They maybe removed or renamed in future.

lang (Boolean) Parses and adds the language value to e-* default is false
parseLatLonGeo (Boolean) Parse geo date writen as latlon i.e. 30.267991;-97.739568 default is false

Output

JSON output. This is an example of a parsed h-card microformat.

    {
        "items": [{
            "type": ["h-card"],
             "properties": {
                "url": ["http://blog.lizardwrangler.com/"],
                "name": ["Mitchell Baker"],
                "org": ["Mozilla Foundation"],
                "note": ["Mitchell is responsible for setting the direction Mozilla ..."],
                "category": ["Strategy", "Leadership"]
             }
        }],
        "rels": {},
        "rel-urls": {}
    }

Count

The count method returns the number of each microformat type found. It does not do a full parse so it is much quicker than get and can be used for tasks such as adding notifications to the UI. The method can take a options object as a parameter.

    var Microformats = require('microformat-node'),
        options = {};

    options.html = '<a class="h-card" href="http://glennjones.net">Glenn</a>';
    Microformats.count(options, function(err, data){
        // do something with data
    });

Output

    {
        'h-event': 1,
        'h-card': 2,
        'rels': 6
    }

isMicroformat

The isMicroformat method returns weather a node has a valid microformats class. It currently does not work consider rel=* a microformats. The method can take a options object as a second parameter.

    var Microformats = require('microformat-node'),
        options = {};

    options.html = '<a class="h-card" href="http://glennjones.net">Glenn</a>';
    Microformats.isMicroformat(options, function(err, isValid){
        // do something with isValid
    });

hasMicroformats

The hasMicroformats method returns weather a document or node has any valid microformats class. It currently does not take rel=* microformats into account. The method can take a options object as a second parameter.

    var Microformats = require('microformat-node'),
        options = {};

    options.html = '<div><a class="h-card" href="http://glennjones.net">Glenn</a></div>';
    Microformats.hasMicroformats(options, function(err, isValid){
        // do something with isValid
    });

using a Async calls

There are promise based version of the four public methods, each is appended with the text Async. So the names for promise methods are getAsync, countAsync, isMicroformatAsync and hasMicroformatsAsync.

    var Microformats = require('microformat-node'),
        options = {};

    options.html = '<a class="h-card" href="http://glennjones.net">Glenn</a>';
    let data = await Microformats.getAsync(options)
        .then(function (data) {
            // do something with data
        })
        .catch(function(err){
            // do something with err
        })

Version and livingStandard

The library has two properties to help identify now up todate it is:

version (String) interanl version number
livingStandard (String ISO Date) the current https://github.com/microformats/tests used.

Microformats definitions object

The library has built-in version 1 microformats definitions, but you can add new definitions using options.add if you wish. Below is an example of a definitions object. Examples of existing definitions found in the directory lib/maps. You not need to add new definitions object if your using the microformats version 2.

    {
		root: 'hpayment',
		name: 'h-payment',
		properties: {
			'amount': {},
			'currency': {}
		}
	}

Running simple demo page

$ git clone https://github.com/glennjones/microformat-node.git
$ cd microformat-node
$ npm i
$ npm start

Then open http://0.0.0.0:3000

License

MIT © Copyright Glenn Jones

microformat-node's People

Contributors

Stargazers

Watchers

Forkers

chris-rock imclab jbinkleyj vanderwal mko voxpelli pyhedgehog rudhanster andyleap topcheese sgml sebilasse edwardhinkle sknebel siparker microformats qubyte

microformat-node's Issues

parseUrl promise not working

Just wanted to try it out and did run the parseUrl-promise example. However it did always error with "TypeError: Cannot call method 'then' of undefined"

After checking the source code I saw that the "parseUrl" does not return a promise. The return is actually simply commented out (5546247). Is there a reason for that or did that happen by accident? If not the example should probably get removed.

Wrong type detected?

Testing this library:

var uf  = require("microformat-node");
uf.parseUrl('http://microformats.org/2014/06', {}, function (err, result) {
  console.log(JSON.stringify(result.items, null, 2));
});

This results in the following:

[
  ...,
  {
    "type": [
      "h-card"
    ],
    "properties": {
      "url": [
        "http://tantek.com/"
      ],
      "name": [
        "Tantek"
      ]
    }
  }
]

while viewing the source shows

<address class="vcard"><a class="url fn" href="http://tantek.com/">Tantek</a></address>

It seems the wrong type is detected?

p-name breaks on empty text

The example below is not parsing correctly. I would expect the entry "name" to be the empty string. Adding any non-whitespace text to the e-content causes it to revert to expected behavior.

<!DOCTYPE html>
<html lang="en">
<head>
</head>
<body>
    <div class="h-entry">
        <a href="http://this.site/photo" class="u-url"></a>
        <div class="e-content p-name"><img src="photo.jpg" class="u-photo"/></div>

        Some extraneous text

        <div class="h-cite">
            <a href="http://someother.site/like" class="u-url"></a>
            <a href="http://this.site/photo" class="u-like-of"></a>
            <div class="e-content p-name">liked this</div>
        </div>
    </div>
</body>
</html>

{ items: 
   [ { type: [ 'h-entry' ],
       properties: 
        { url: [ 'http://this.site/photo' ],
          content: [ { value: '', html: '<img src="photo.jpg" class="u-photo" />' } ],
          photo: [ 'photo.jpg' ],
          name: [ 'Some extraneous text\r\n\r\n        \r\n            \r\n            \r\n            liked this' ] },
       children: 
        [ { value: 'liked this',
            type: [ 'h-cite' ],
            properties: 
             { url: [ 'http://someother.site/like' ],
               'like-of': [ 'http://this.site/photo' ],
               content: [ { value: 'liked this', html: 'liked this' } ],
               name: [ 'liked this' ] } } ] } ],
  rels: {},
  'rel-urls': {} }

The top most level of h-* do not record more than one type

i.e. class="h-entry h-note" is returned as type: ["h-entry"] but should return type: ["h-entry","h-note']

Alternative node/JS microformats parser

This project hasn't been maintained in a number of years.

There is an up-to-date library available here: https://github.com/microformats/microformats-parser, that works both with node.js and in the browser. Also supports TypeScript.

When using two h-* you get duplicate properties

This could either be an issue with the parser, or a unforeseen issue in the spec

Error when parsing http://www.boemiadigital.com/

trying to run following code:

var uf = require('microformat-node');
uf.parseUrl('http://www.boemiadigital.com/', {}, console.log);

results in an error:

TypeError: Cannot call method 'toString' of undefined
    at Object.Parser.impliedRules (/Users/janpotoms/Woorank/code/piccolo/node_modules/microformat-node/lib/parser.js:636:70)
    at null.<anonymous> (/Users/janpotoms/Woorank/code/piccolo/node_modules/microformat-node/lib/parser.js:754:14)
    at exports.each (/Users/janpotoms/Woorank/code/piccolo/node_modules/microformat-node/node_modules/cheerio/lib/api/traversing.js:125:24)
    at Object.Parser.walkChildren (/Users/janpotoms/Woorank/code/piccolo/node_modules/microformat-node/lib/parser.js:655:24)
    at null.<anonymous> (/Users/janpotoms/Woorank/code/piccolo/node_modules/microformat-node/lib/parser.js:722:13)
    at exports.each (/Users/janpotoms/Woorank/code/piccolo/node_modules/microformat-node/node_modules/cheerio/lib/api/traversing.js:125:24)
    at Object.Parser.walkChildren (/Users/janpotoms/Woorank/code/piccolo/node_modules/microformat-node/lib/parser.js:655:24)
    at null.<anonymous> (/Users/janpotoms/Woorank/code/piccolo/node_modules/mJans-MacBook-Pro:piccolo janpotoms$

Backcompat parsing conflicts

I recently added mf1 markup to my review posts in order to appear as Google Rich Snippets. This has had the unfortunate side effect of confusing the crap out of mf2 parsers. Right now, the python parser is the only one that gets it right.

Original post: https://aaronparecki.com/2016/12/15/16/dropvox
node parsed
python parsed

Note there are 5 empty objects before the real h-review. Additionally the p-item property ended up confused. It should be an h-product, but instead is a weird mix of h-item (where did that come from) and the h-product appears as the url property.

Truncated Dates (bday / without years) are not parsed

Using truncated representations of dates for birth date is often good practice as noted in the vcard spec http://microformats.org/wiki/h-card#dt-bday

"--12-28"
Apart from citing parecki's birthday from the public h-card (send him much gifts) I'll look into it now for a fix [problem is in the intermediate step we would have to use the year 9999 and replace it afterwards, one reason why I am using a format like in my nlp module ].

Feel free to look into the following PR ...

Update microformat-shiv dependency reference

I am getting build errors with Travis-CI due to an intermittent 403 response from this api.github URL during npm install, even though it seems to install fine locally:

"microformat-shiv": "https://api.github.com/repos/glennjones/microformat-shiv/tarball/"

Would you be opposed to using the owner/repo convention outlined here?

"microformat-shiv": "glennjones/microformat-shiv"

Some URL's causes library to "hang"

Parsing the URL "https://www.zalando.nl/nike-performance-trainingsbroek-blackcool-grey-n1241e0em-q11.html" causes the library to hang. It does not crash, but rather hangs for a very long time at 100% cpu-core consumption. Memory does not increase significantly.

Code to reproduce the problem:

const fetch = require('node-fetch');
const mf = require('microformat-node');

const fetchUrl = url => fetch(url)
    .then(res  => res.text())
    .then(html => mf.getAsync({html}));

fetchUrl('https://www.zalando.nl/nike-performance-trainingsbroek-blackcool-grey-n1241e0em-q11.html')

microformat-node version: 2.0.1
node version: v8.9.4 (also tested with v6.11.5 with same problem)

Could you please give me some pointers where to start looking in the code for a possible cause of this problem. Thanks!

Error when parsing 'http://www.limitedtoendodontics.com'

Running the following:

var uf = require('microformat-node');
uf.parseUrl('http://www.limitedtoendodontics.com/', {}, console.log);

I get following error:

SyntaxError: empty sub-selector
  at parse (/var/www/piccolo/node_modules/cheerio/node_modules/CSSselect/node_modules/CSSwhat/index.js:178:9)
  at compileUnsafe (/var/www/piccolo/node_modules/cheerio/node_modules/CSSselect/lib/compile.js:26:9)
  at select (/var/www/piccolo/node_modules/cheerio/node_modules/CSSselect/index.js:17:43)
  at CSSselect (/var/www/piccolo/node_modules/cheerio/node_modules/CSSselect/index.js:40:9)
  at exports.find (/var/www/piccolo/node_modules/cheerio/lib/api/traversing.js:13:21)
  at Object.Parser.appendInclude (/var/www/piccolo/node_modules/microformat-node/lib/parser.js:1462:28)
  at Object.Parser.addAttributeIncludes (/var/www/piccolo/node_modules/microformat-node/lib/parser.js:1413:11)
  at Object.Parser.addIncludes (/var/www/piccolo/node_modules/microformat-node/lib/parser.js:1388:8)
  at Object.Parser.get (/var/www/piccolo/node_modules/microformat-node/lib/parser.js:278:10)
  at Object.parseDom (/var/www/piccolo/node_modules/microformat-node/lib/parser.js:155:17

Remove or separate out parseUrl() to slim down library?

I was playing around a bit trying to see if I could get this library working with React Native and got Cheerio and HTMLParser2 working, but for some reason didn't get this module working and that's probably due to the Request module.

When I use microformat-node I often do it in a context where I already have URL fetching available. In eg. https://github.com/voxpelli/webpage-webmentions I use request myself, but in some newer projects I've started experimenting with other libraries like node-fetch.

I also often do the fetching through something like fetch-politely to ensure proper rate-limiting and robots.txt compliance.

All in all – I rarely use the fetching that this library provides, I just use it for parsing the HTML – either directly or by sending it a cheerio object. So this module including the entire request library is for my use-cases often a bit redundant – and since one can't exclude a dependency it makes it hard to eg. do a slim build for something like React Native where size and complexity might matter more than it does on server side node.js.

So – my proposal would be to either just remove the URL-fetching altogether or to split it up so there's a module that's focused on just the parsing and another that also provides helpers like parseUrl().

What's your thoughts?

crash on certain markup

Library crashes on following markup with a TypeError: Cannot set property '0' of undefined:

var Microformats = require('microformat-node');
Microformats.get({
  html: '<div class="hentry"><div class="dt-">0AM<div class="dt-">x</div></div></div>'
}, function (err, data) {
   console.log(err, data);
});

narrowed down sample from: http://lbpm.com/

also happening on www.nesbitts.com, www.browsbyfay.com, decijisajam.rs, phytocarestetica.com

Parsing an h-entry with a root of <article> results in an empty h-entry

Given the following HTML stored as a variable body in JS (taken from the h-entry example):

<article class="h-entry">
  <h1 class="p-name">Microformats are amazing</h1>
  <p>Published by <a class="p-author h-card" href="http://example.com">W. Developer</a>
     on <time class="dt-published" datetime="2013-06-13 12:00:00">13<sup>th</sup> June 2013</time>
  <p class="p-summary">In which I extoll the virtues of using microformats.</p>
  <div class="e-content">
    <p>Blah blah blah</p>
  </div>
</article>

and the following parsing call (or substituting the parseUrl that returns body as the previous HTML):

microformats.parseHtml(body, {'filters': ['h-entry']});

The result in items in the resultant data object will be an empty h-entry object like the following:

{
  "items": [{
    "type": ["h-entry"],
    "properties": {}
  }],
  "rels":{}
}

Converting the article element into a div element results in a successful, fully-parsed h-entry object like so:

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "author": [
                    {
                        "type": [
                            "h-card"
                        ],
                        "properties": {
                            "name": [
                                "W. Developer"
                            ],
                            "url": [
                                "http:\/\/example.com"
                            ]
                        },
                        "value": "W. Developer"
                    }
                ],
                "name": [
                    "Microformats are amazing"
                ],
                "summary": [
                    "In which I extoll the virtues of using microformats."
                ],
                "published": [
                    "2013-06-13 12:00:00"
                ],
                "content": [
                    {
                        "html": "&#xD;\n    <p>Blah blah blah<\/p>&#xD;\n  ",
                        "value": "Blah blah blah"
                    }
                ]
            }
        }
    ],
    "rels": {}
}

I haven't had time to investigate the problem fully yet, but I believe that the source of the problem might be in the use of cheerio and its support for HTML5 elements. If it isn't, then it's something directly in microformat-node.

Dependency "ent" uses deprecated Node punycode module

This library use abandoned package "ent" that use deprecated Node punycode module.

Temporary solution: https://www.npmjs.com/package/ent-replace

crash on certain input

following code crashes with a TypeError: Cannot read property 'replace' of null:

var Microformats = require('microformat-node');
Microformats.get({
  html: '<div class="include item"></div>'
}, function (err, data) {
   console.log(err, data);
});

also fails on input:

'<div class="include"></div><div class="item"></div>'

narrowed down sample from http://www.advodata.be/

also happening on www.netflow.ro

parseUrl example results in an error

microformats.parseUrl('http://glennjones.net/about', options, function(err, data){
  if (err) throw (err);
  console.log(data);
});

results in:

TypeError: Cannot read property '0' of null
    at getName (/Users/bret/Documents/Git-Clones/iwc-log-feed/node_modules/microformat-node/node_modules/cheerio/node_modules/CSSselect/node_modules/CSSwhat/index.js:78:36)
    at getLCName (/Users/bret/Documents/Git-Clones/iwc-log-feed/node_modules/microformat-node/node_modules/cheerio/node_modules/CSSselect/node_modules/CSSwhat/index.js:84:14)
    at parse (/Users/bret/Documents/Git-Clones/iwc-log-feed/node_modules/microformat-node/node_modules/cheerio/node_modules/CSSselect/node_modules/CSSwhat/index.js:160:12)
    at compileUnsafe (/Users/bret/Documents/Git-Clones/iwc-log-feed/node_modules/microformat-node/node_modules/cheerio/node_modules/CSSselect/lib/compile.js:26:9)
    at select (/Users/bret/Documents/Git-Clones/iwc-log-feed/node_modules/microformat-node/node_modules/cheerio/node_modules/CSSselect/index.js:17:43)
    at CSSselect (/Users/bret/Documents/Git-Clones/iwc-log-feed/node_modules/microformat-node/node_modules/cheerio/node_modules/CSSselect/index.js:40:9)
    at exports.find (/Users/bret/Documents/Git-Clones/iwc-log-feed/node_modules/microformat-node/node_modules/cheerio/lib/api/traversing.js:7:21)
    at new module.exports (/Users/bret/Documents/Git-Clones/iwc-log-feed/node_modules/microformat-node/node_modules/cheerio/lib/cheerio.js:83:18)
    at initialize (/Users/bret/Documents/Git-Clones/iwc-log-feed/node_modules/microformat-node/node_modules/cheerio/lib/static.js:19:12)
    at Object.Parser.apppendInclude (/Users/bret/Documents/Git-Clones/iwc-log-feed/node_modules/microformat-node/lib/parser.js:1446:13)

Digging around a bit to see what the cause is. Parsing other URLs seems to work though.

HTML entity handling

HTML entities are indistinguishable from actual tags in the "html" part of e- properties. Additionally, value and p- properties may be corrupted when using textFormat: whitespacetrimmed.

Test case:

<div class="h-entry">
    <div class="p-name e-content">x&lt;y AT&amp;T &lt;b&gt;NotBold&lt;/b&gt; <b>Bold</b></div>
</div>

Output (textformat: 'whitespacetrimmed'):

{
    "items": [{
        "type": ["h-entry"],
        "properties": {
            "name": ["xNotBold Bold"],
            "content": [{
                "value": "xNotBold Bold",
                "html": "x<y AT&T <b>NotBold</b> <b>Bold</b>"
            }]
        }
    }],
    "rels": {},
    "rel-urls": {}
}

Output (textFormat: 'normalised'):

{
    "items": [{
        "type": ["h-entry"],
        "properties": {
            "name": ["x<y AT&T <b>NotBold</b> Bold"],
            "content": [{
                "value": "x<y AT&T <b>NotBold</b> Bold",
                "html": "x<y AT&T <b>NotBold</b> <b>Bold</b>"
            }]
        }
    }],
    "rels": {},
    "rel-urls": {}
}

Expected for both cases:

{
    "items": [{
        "type": ["h-entry"],
        "properties": {
            "name": ["x<y AT&T <b>NotBold</b> Bold"],
            "content": [{
                "value": "x<y AT&T <b>NotBold</b> Bold",
                "html": "x&lt;y AT&amp;T &lt;b&gt;NotBold&lt;/b&gt; <b>Bold</b>"
            }]
        }
    }],
    "rels": {},
    "rel-urls": {}
}

Can't find '../test/testWriter.js' from bin/microformat-node

The likely cause is that test/testWriter.js was removed in: c44e1da

It means that bin/microformat-node refuses to start now though.

Not sure what function it played though so can't provide an easy PR to fix it unfortunately.

parseDom (feature request)

Current parseDom operates on jQuery-like objects.

Are you planning implementing parser for real DOM API (getElementsByClassName and so on - i.e. jsdom implements this)?
If not - can I volunteer? (no sense to start if you revoke pull request)
It yes - what should be function names? (either we should rename old one, or somehow mangle new one)

CSS selector no longer works

Error when parsing www.markgordondentistry.com

I get an error when running the following:

var uf = require('microformat-node');
uf.parseUrl('http://www.markgordondentistry.com/', {}, console.log);

TypeError: Cannot read property '0' of null
  at getName (/var/www/piccolo/node_modules/cheerio/node_modules/CSSselect/node_modules/CSSwhat/index.js:76:36)
  at parse (/var/www/piccolo/node_modules/cheerio/node_modules/CSSselect/node_modules/CSSwhat/index.js:129:13)
  at compileUnsafe (/var/www/piccolo/node_modules/cheerio/node_modules/CSSselect/lib/compile.js:26:9)
  at select (/var/www/piccolo/node_modules/cheerio/node_modules/CSSselect/index.js:17:43)
  at CSSselect (/var/www/piccolo/node_modules/cheerio/node_modules/CSSselect/index.js:40:9)
  at exports.find (/var/www/piccolo/node_modules/cheerio/lib/api/traversing.js:13:21)
  at Object.Parser.appendInclude (/var/www/piccolo/node_modules/microformat-node/lib/parser.js:1462:28)
  at Object.Parser.addAttributeIncludes (/var/www/piccolo/node_modules/microformat-node/lib/parser.js:1413:11)
  at Object.Parser.addIncludes (/var/www/piccolo/node_modules/microformat-node/lib/parser.js:1388:8)
  at Object.Parser.get (/var/www/piccolo/node_modules/microformat-node/lib/parser.js:278:10)

It seems this page has a itemref='Mark J. Gordon, DDS' in it somewhere which makes the engine choke.

Cache starts a setTimeout even when unused

Hi Glenn, awesome module!

The checkLimits function in lib/cache.js is automatically called when requiring microformat-node, even when the Parser doesn't use caching. Personally, this has prevented an app from closing (due to the setTimeout still running) and also has caused tests (from a module that I'm working on) from exiting.

If possible, it would be great if the cache didn't automatically start until its first use. I'm just knocking up a quick feature branch now that does this and will push it up shortly so that it can be discussed further.

Regression in text parsing of weird XSS:ish content?

Not sending a PR for this one because neither do I exactly know what's causing it or do I think it's that big of a deal that it's worth spending a lot of time on debugging, but making a note here anyhow so that it's documented.

In the checkmention project by @kbsriram there's an XSS test that parses differently now compared to version 0.3.x: https://github.com/kbsriram/checkmention/blob/master/src/WEB-INF/checks/xss

More specifically – the 2.0.0 version of this library is now ignoring the final </script> in the <<SCRIPT>alert("XSS4");//<</SCRIPT> code in there and thus makes all of the remaining e-content be treated as content of the script tag and thus drops all of that content from the text. This didn't happen before.

This means that the text output of parsing that XSS file is now:

Clicking this\nshould not cause an alert.\nThis div\nshould not alert.\nTry clicking this link\n<script>alert(\"encoded-xss\")</script>\nand this too.\nMouse over this\nshould not cause an alert. This broken\n should not throw an alert.\n<

When it before was:

Clicking this\nshould not cause an alert.\nThis div\nshould not alert.\nTry clicking this link\n<script>alert(\"encoded-xss\")</script>\nand this too.\nMouse over this\nshould not cause an alert. This broken\n should not throw an alert.\n<alert(\"XSS4\");//\n\nNeither should .\nPlease look at the Owasp XSS prevention cheat sheet for more information.\n\n\nThis note was created on\n\n%%nice_time

When looking at the same content directly in the browser I would say that the former handling was more correct than the current, but not sure what has changed. Cheerio is still the same so must be something outside of Cheerios control?

But – as I said in the beginning – this feels like an edge case that's not really worth spending a lot of time on fixing if it isn't an indicator of something bigger which I don't think it is.

href ignored for u-* properties

In the following example the parser uses the textContent for u-url and u-uid. Based on the u-* parsing rules from http://microformats.org/wiki/microformats2-parsing i was expecting to get the href attribute. As rule 1 if a.u-x[href] or area.u-x[href] or link.u-x[href], then get the href attribute.

var chai = require('chai'),
   assert = chai.assert,
   helper = require('../test/helper.js');


describe('h-entry', function() {
   var htmlFragment = "<li class=\"h-entry hentry\">\r\n  <p class=\"p-name entry-title e-content entry-content\">\r\n    Google the company *already* effectively rebranded, into Alphabet_Inc.\r\n  <\/p>\r\n  <span class=\"footer\"><a href=\"2019\/156\/t2\/\" class=\"dt-published published dt-updated updated u-url u-uid\"><time class=\"value\" datetime=\"11:56-0700\">11:56<\/time> on <time class=\"value\">2019-06-05<\/time><\/a><\/span>\r\n<\/li>";
   var found = helper.parseHTML(htmlFragment,'http://example.com/');
   var expected = "/2019/156/t2/";

   it('u-url', function(){
      var url = found.items[0].properties.url[0];
      assert.equal(url, expected);
   });

   it('u-uid', function(){
      var uid = found.items[0].properties.uid[0];
      assert.equal(uid, expected);
   });
});

glennjones / microformat-node Goto Github PK

microformat-node's Introduction

microformat-node

Installation

Methods

get

Options

Experimental Options

Output

Count

isMicroformat

hasMicroformats

using a Async calls

Version and livingStandard

Microformats definitions object

Running simple demo page

License

microformat-node's People

Contributors

Stargazers

Watchers

Forkers

microformat-node's Issues

Recommend Projects

Recommend Topics

Recommend Org