Giter VIP home page Giter VIP logo

spiderable-middleware's Introduction

Spiderable middleware

Google, Facebook, Twitter, Yahoo, and Bing and all other crawlers and search engines are constantly trying to view your website. If your website is built on top of the JavaScript framework like, but not limited to - Angular, Backbone, Ember, Meteor, React, MEAN most of the front-end solutions returns basic HTML-markup and script-tags to crawlers, but not content of your page. The mission of spiderable-middleware and ostr.io are to boost your SEO experience without a headache.

About Package

This package acts as middleware and intercepts requests to your Node.js application from web crawlers. All requests proxy passed to the Spiderable (Prerender) Service, which returns static, rendered HTML.

  • Note: This package proxies real HTTP headers and response code, to reduce overwhelming requests, try to avoid HTTP-redirect headers, like Location and others. Read how to return genuine status code and handle JS-redirects.
  • Note: This is server only package. For isomorphic environments, like Meteor.js, this package should be imported/initialized only at server code base.

This middleware was tested and works like a charm with:

All other frameworks which follow Node's middleware convention - will work too.

This package was originally developed for ostr.io service. But it's not limited to, and can proxy-pass requests to any other rendering-endpoint.

ToC

Installation

NPM:

npm install spiderable-middleware

Meteor:

meteor add webapp
meteor add ostrio:spiderable-middleware

Basic usage

See all examples.

First, add fragment meta-tag to your HTML template:

<html>
  <head>
    <meta name="fragment" content="!">
    <!-- ... -->
  </head>
  <body>
    <!-- ... -->
  </body>
</html>
const express    = require('express');
const app        = express();
const Spiderable = require('spiderable-middleware');
const spiderable = new Spiderable({
  rootURL: 'http://example.com',
  serviceURL: 'https://render.ostr.io',
  auth: 'APIUser:APIPass'
});

app.use(spiderable.handler).get('/', (req, res) => {
  res.send('Hello World');
});

app.listen(3000);

We provide various options for serviceURL as "Rendering Endpoints", each has its own features, to fit every project needs.

Meteor specific usage

// meteor add webapp
// meteor add ostrio:spiderable-middleware
import { WebApp } from 'meteor/webapp';
import Spiderable from 'meteor/ostrio:spiderable-middleware';

WebApp.connectHandlers.use(new Spiderable({
  rootURL: 'http://example.com',
  serviceURL: 'https://render.ostr.io',
  auth: 'APIUser:APIPass'
}));

Return genuine status code

To pass expected response code from front-end JavaScript framework to browser/crawlers, you need to create specially formatted HTML-comment. This comment can be placed in any part of HTML-page. head or body tag is the best place for it.

Format (html):

<!-- response:status-code=404 -->

Format (jade):

// response:status-code=404

This package support all standard and custom status codes:

  • 201 - <!-- response:status-code=201 -->
  • 401 - <!-- response:status-code=401 -->
  • 403 - <!-- response:status-code=403 -->
  • 499 - <!-- response:status-code=499 --> (non-standard)
  • 500 - <!-- response:status-code=500 -->
  • 514 - <!-- response:status-code=514 --> (non-standard)

Note: Reserved status codes for internal service communications: 49[0-9].

Speed-up rendering

To speed-up rendering, you should tell to the Spiderable engine when your page is ready. Set window.IS_RENDERED to false, and once your page is ready set this variable to true. Example:

<html>
  <head>
    <meta name="fragment" content="!">
    <script>
      window.IS_RENDERED = false;
    </script>
  </head>
  <body>
    <!-- ... -->
    <script type="text/javascript">
      //Somewhere deep in your app-code:
      window.IS_RENDERED = true;
    </script>
  </body>
</html>

JavaScript redirects

If you need to redirect browser/crawler inside your application, while a page is loading (imitate navigation), you're free to use any of classic JS-redirects as well as your framework's navigation, or History.pushState()

window.location.href = 'http://example.com/another/page';
window.location.replace('http://example.com/another/page');

Router.go('/another/page'); // framework's navigation

Note: Only 4 redirects are allowed during one request after 4 redirects session will be terminated.

API

Constructor new Spiderable([opts])

  • opts {Object} - Configuration options
  • opts.serviceURL {String} - Valid URL to Spiderable endpoint (local or foreign). Default: https://render.ostr.io. Can be set via environment variables: SPIDERABLE_SERVICE_URL or PRERENDER_SERVICE_URL
  • opts.rootURL {String} - Valid root URL of your website. Can be set via an environment variable: ROOT_URL (common for meteor)
  • opts.auth {String} - [Optional] Auth string in next format: user:pass. Can be set via an environment variables: SPIDERABLE_SERVICE_AUTH or PRERENDER_SERVICE_AUTH. Default null
  • opts.bots {[String]} - [Optional] An array of strings (case insensitive) with additional User-Agent names of crawlers you would like to intercept. See default bot's names
  • opts.ignore {[String]} - [Optional] An array of strings (case sensitive) with ignored routes. Note: it's based on first match, so route /users will cause ignoring of /part/users/part, /users/_id and /list/of/users, but not /user/_id or /list/of/blocked-users. Default null
  • opts.only {[String|RegExp]} - [Optional] An array of strings (case sensitive) or regular expressions (could be mixed). Define exclusive route rules for prerendering. Could be used with opts.onlyRE rules. Note: To define "safe" rules as {RegExp} it should start with ^ and end with $ symbols, examples: [/^\/articles\/?$/, /^\/article/[A-z0-9]{16}\/?$/]
  • opts.onlyRE {RegExp} - [Optional] Regular Expression with exclusive route rules for prerendering. Could be used with opts.only rules

Note: Setting .onlyRE and/or .only rules are highly recommended. Otherwise, all routes, including randomly generated by bots will be subject of Prerendering and may cause unexpectedly higher usage.

// CommonJS
// const Spiderable = require('spiderable-middleware');

// Meteor.js
// import Spiderable from 'meteor/ostrio:spiderable-middleware';

const spiderable = new Spiderable({
  rootURL: 'http://example.com',
  serviceURL: 'https://render.ostr.io',
  auth: 'APIUser:APIPass'
});

// More complex setup (recommended):
const spiderable = new Spiderable({
  rootURL: 'http://example.com',
  serviceURL: 'https://render.ostr.io',
  auth: 'APIUser:APIPass',
  only: [
    /\/?/, // Root of the website
    /^\/posts\/?$/, // "/posts" path with(out) trailing slash
    /^\/post\/[A-z0-9]{16}\/?$/ // "/post/:id" path with(out) trailing slash
  ],
  // [Optionally] force ignore for secret paths:
  ignore: [
    '/account/', // Ignore all routes under "/account/*" path
    '/billing/' // Ignore all routes under "/billing/*" path
  ]
});

spiderable.handler(req, res, next)

Middleware handler. Alias: spiderable.handle.

// Express, Connect:
app.use(spiderable.handler);

// Meteor:
WebApp.connectHandlers.use(spiderable);

//HTTP(s) Server
http.createServer((req, res) => {
  spiderable.handler(req, res, () => {
    // Callback, triggered if this request
    // is not a subject of spiderable prerendering
    res.writeHead(200, {'Content-Type': 'text/plain; charset=UTF-8'});
    res.end('Hello vanilla NodeJS!');
    // Or do something else ...
  });
}).listen(3000);

AMP Support

To properly serve pages for Accelerated Mobile Pages (AMP) we support following URI schemes:

# Regular URIs:
https://example.com/index.html
https://example.com/articles/article-title.html
https://example.com/articles/article-uniq-id/article-slug

# AMP optimized URIs (prefix):
https://example.com/amp/index.html
https://example.com/amp/articles/article-title.html
https://example.com/amp/articles/article-uniq-id/article-slug

# AMP optimized URIs (extension):
https://example.com/amp/index.amp.html
https://example.com/amp/articles/article-title.amp.html

All URLs with .amp. extension and /amp/ prefix will be optimized for AMP.

Running Tests

  1. Clone this package
  2. In Terminal (Console) go to directory where package is cloned
  3. Then run:

Node.js/Mocha

# Install development NPM dependencies:
npm install --save-dev
# Install NPM dependencies:
npm install --save
# Run tests:
ROOT_URL=http://127.0.0.1:3003 npm test
# http://127.0.0.1:3003 can be changed to any local address, PORT is required!

Meteor/Tinytest

meteor test-packages ./ --port 3003
# PORT is required, and can be changed to any local open port

spiderable-middleware's People

Contributors

dr-dimitru avatar

Watchers

James Cloos avatar NICK avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.