Giter VIP home page Giter VIP logo

gulp-sitemap's Introduction

gulp-sitemap

Generate a search engine friendly sitemap.xml using a Gulp stream

NPM version NPM Downloads Build Status

Easily generate a search engine friendly sitemap.xml from your project.

:bowtie: Search engines love the sitemap.xml and it helps SEO as well.

For information about sitemap properties and structure, see the wiki for sitemaps

Install

Install with npm

$ npm install --save-dev gulp-sitemap

Example

var gulp = require('gulp');
var sitemap = require('gulp-sitemap');

gulp.task('sitemap', function () {
    gulp.src('build/**/*.html', {
            read: false
        })
        .pipe(sitemap({
            siteUrl: 'http://www.amazon.com'
        }))
        .pipe(gulp.dest('./build'));
});
  • siteUrl is required.
  • index.html will be turned into directory path /.
  • 404.html will be skipped automatically. No need to unglob it.

Let's see an example of how we can create and output a sitemap, and then return to the original stream files:

var gulp = require('gulp');
var sitemap = require('gulp-sitemap');
var save = require('gulp-save');

gulp.task('html', function() {
    gulp.src('*.html', {
          read: false
        })
        .pipe(save('before-sitemap'))
        .pipe(sitemap({
                siteUrl: 'http://www.amazon.com'
        })) // Returns sitemap.xml
        .pipe(gulp.dest('./dist'))
        .pipe(save.restore('before-sitemap')) //restore all files to the state when we cached them
        // -> continue stream with original html files
        // ...
});

Options

siteUrl

Your website's base url. This gets prepended to all documents locations.

Type: string

Required: true

fileName

Determine the output filename for the sitemap.

Type: string

Default: sitemap.xml

Required: false

changefreq

Gets filled inside the sitemap in the tag <changefreq>. Not added by default.

Type: string

Default: undefined

Valid Values: ['always', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'never']

Required: false

Note: any falsey value is also valid and will skip this xml tag

priority

Gets filled inside the sitemap in the tag <priority>. Not added by default.

Type: string|function

Default: undefined

Valid Values: 0.0 to 1.0

Required: false

Note: any falsey (non-zero) value is also valid and will skip this xml tag

Example using a function as priority:

priority: function(siteUrl, loc, entry) {
    // Give pages inside root path (i.e. no slashes) a higher priority
    return loc.split('/').length === 0 ? 1 : 0.5;
}

lastmod

The file last modified time.

  • If null then this plugin will try to get the last modified time from the stream vinyl file, or use Date.now() as lastmod.
  • If the value is not null - It will be used as lastmod.
  • When lastmod is a function, it is executed with the current file given as parameter. (Note: the function is expected to be sync).
  • A string can be used to manually set a fixed lastmod.

Type: string|datetime|function

Default: null

Required: false

Example that uses git to get lastmod from the latest commit of a file:

lastmod: function(file) {
  var cmd = 'git log -1 --format=%cI "' + file.relative + '"';
  return execSync(cmd, {
    cwd: file.base
  }).toString().trim();
}

Note: any falsey (other than null) value is also valid and will skip this xml tag

newLine

How to join line in the target sitemap file.

Type: string

Default: Your OS's new line, mostly: \n

Required: false

spacing

How should the sitemap xml file be spaced. You can use \t for tabs, or with 2 spaces if you'd like.

Type: string

Default: (4 spaces)

Required: false

noindex

Exclude pages from the sitemap when the robots meta tag is set to noindex. The plugin needs to be able to read the contents of the files for this to have an effect.

Type: boolean

Default: false

Required: false

images

For generate sitemap for images per page, just enable images flag to true

Type: boolean

Default: undefined

Required: false

mappings

An object to custom map pages to their own configuration.

This should be an array with the following structure:

Type: array

Default: []

Required: false

Example:

mappings: [{
    pages: [ 'minimatch pattern' ],
    changefreq: 'hourly',
    priority: 0.5,
    lastmod: Date.now(),
    getLoc(siteUrl, loc, entry) {
        // Removes the file extension if it exists
        return loc.replace(/\.\w+$/, '');
    },
    hreflang: [{
        lang: 'ru',
        getHref(siteUrl, file, lang, loc) {
            return 'http://www.amazon.ru/' + file;
        }
    }]
},
//....
]
  • Every file will be matched against the supplied patterns
  • Only defined attributes for a matched file are applied.
  • Only the first match will apply, so consequent matches for the filename will not apply.
  • Possible attributes to set: hreflang, changefreq, priority, loc and lastmod.
  • All rules applying to options apply to the attributes that can overridden.
pages

Type: array

Required: true

This is an array with minimatch patterns to match the relevant pages to override. Every file will be matched against the supplied patterns.

Uses multimatch to match patterns against filenames.

Example: pages: ['home/index.html', 'home/see-*.html', '!home/see-admin.html']

hreflang

Matching pages can get their hreflang tags set using this option.

The input is an array like so:

hreflang: [{
    lang: 'ru',
    getHref: function(siteUrl, file, lang, loc) {
        // return href src for the hreflang. For example:
        return 'http://www.amazon.ru/' + file;
    }
}]
getLoc

Matching pages can get their loc tag modified by using a function.

getLoc: function(siteUrl, loc, entry) {
    return loc.replace(/\.\w+$/, '');
}

verbose

Type: boolean

Required: false

Default: false

If true, will log the number of files that where handled.

Complementary plugins

  • gulp-sitemap-files - Get all files listed in a sitemap (Perhaps one generated from this plugin)

Thanks

To grunt-sitemap for the inspiration on writing this.

License

MIT © Gilad Peleg

gulp-sitemap's People

Contributors

alex-chuev avatar derhuerst avatar fabiangigler avatar frank3k avatar hupfis avatar nektro avatar owlbertz avatar paulcr avatar pgilad avatar simaodeveloper avatar simonknittel avatar thedancingcode avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

gulp-sitemap's Issues

Error in plugin "gulp-sitemap"

Environment:
node -v v10.15.3
npm -v 6.4.1
gulp -v CLI version 2.0.1 / Local version 4.0.0

I created a clean test with 3 minimal html files (index, about and 404) and this gulpfile:

var gulp = require('gulp');
var sitemap = require('gulp-sitemap');

gulp.task('sitemap', function () {
    gulp.src('*.html', {
            read: false
        })
        .pipe(sitemap({
            siteUrl: 'http://www.test.com'
        }))
        .pipe(gulp.dest('/'));
});

exports.default = sitemap;

My console reports an error:

sitemap $ gulp
[00:17:43] Using gulpfile ~/WebServer/sitemap/gulpfile.js
[00:17:43] Starting 'default'...
[00:17:43] 'default' errored after 4.4 ms
[00:17:43] Error in plugin "gulp-sitemap"
Message:
    siteUrl is a required param
Details:
    domain: [object Object]
    domainThrown: true

sitemap $

Mention read:false in README.md

In the gulpfile.js I saw that { read: false } is passed as argument to gulp.src(). I did not know about this functionality and I think it would be an improvement if this would be included in the example in README.md.

Feature request: get last modification time from git

I think it would be a nice addition if the last modification time could be taken from the git log of the corresponding html-file. When using the default option, the last modification time is taken from the file itself (if I understand correctly). For git, this is by default the time the file was created or updated using git pull. Furthermore, when using beautifier like gulp-jsbeautifier the modification date is updated upon beautification.

One can get the last modification time (in ISO-8601 notation) of a file (e.g. index.html) using the following git command:

git log -1 --format=%cI index.html

Using plugins like gulp-git this can be embedded in a plugin as follows (pseudo-code):

  var path = ....;
  var cmd = 'log -1 --format=%cI ' + paths;
  return git.exec({ args: cmd, cwd: cwd, quiet: true }, function(err, stdout){
     if (!err && lastmod) {
      var lastmod = stdout.trim();
    }
  });

Pages named xxx_index.html

I have some pages that are named "something_index.html". How can I get them included in the sitemap. Im seeing results that look like this:

<url>
        <loc>https://www.mysite.com/something_</loc>
        <lastmod>2016-02-26T16:57:29.000Z</lastmod>
</url>

My gulp instructions

gulp.task('sitemap', function () {
    gulp.src([
            '/Users/WebServer/mysite.com/**/*.html', 
            '!/Users/WebServer/mysite.com/dont-index/**/*.*'
        ], {read: false})
        .pipe(sitemap({siteUrl: 'https://www.mysite.com'}))
        .pipe(gulp.dest('/Users/WebServer/production_files'))
        .pipe(notify("Sitemap generated"));
});

Sub-directories structures not taken in account

Tried this:

Directory structure:

$ mkdir posts; touch index.html ; touch posts/foo.html
$ tree
.
├── [me              0]  index.html
├── [me            102]  posts
│   └── [me              0]  foo.html

gulpfile.js:

var gulp =     require('gulp'),
    sitemap =  require('gulp-sitemap')
    ;

var website = './';
var paths = {
    html: [website + '*.html', website + 'posts/*.html']
};

gulp.task('default', function () {
    gulp.src(paths.html)
        .pipe(sitemap({
            siteUrl: 'http://domain.com'
        }))
        .pipe(gulp.dest(website))
});

Sitemap output:

$ cat sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>http://domain.com/</loc>
        <lastmod>2015-04-24T14:37:42.000Z</lastmod>
    </url>
    <url>
        <loc>http://domain.com/foo.html</loc> <-- 'posts' subdirectory is missing in path
        <lastmod>2015-04-24T14:37:55.000Z</lastmod>
    </url>
</urlset>

Feature request: specify the xhtml namespace if hreflang is used

Hey, thanks for your module.

I would like to suggest you to add the famous xmlns:xhtml="http://www.w3.org/1999/xhtml" at the end of the <urlset> like that: <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"> if the function hreflang is used.

I did not find a way to do that in your module...

How to replace <urlset> on multi-language sitemap error

I've created a new sitemap with hreflang alternate links. On default export, the sitemap fails validation and renders as unstyled text in the browser.

On default export, my <urlset> contents are:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">

However, I can get it working correctly with the following:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd http://www.w3.org/TR/xhtml11/xhtml11_schema.html http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/TR/xhtml11/xhtml11_schema.html">

What is the best way to replace this line in the sitemap, or am I missing a setting that I can use to fix this?

Return original files to the stream

More of a suggestion than an issue, but I think to fix up a lot of builds this needs to be done.

Basically, we're passing HTML files into gulp-sitemap to be processed and it's outputting sitemap.xml, which we write to the disk with gulp.dest('test/'). The issue is that once we've done this, we can't chain anymore as the file in the stream has changed from an HTML file to sitemap.xml i.e.

gulp.src('*.html')
    .pipe(minifyHtml()) // Returns minified HTML
    .pipe(sitemap()) // Returns sitemap.xml
    .pipe(w3cjs())  // Broken - required HTML but given sitemap.xml

I think the best way to solve this is by introducing a separate property for outputting the Sitemap, e.g.

gulp.src('*.html')
    .pipe(minifyHtml()) // Returns minified HTML
    .pipe(sitemap({ output: 'test/sitemap.xml' })) // Returns HTML
    .pipe(w3cjs())  // Returns validated HTML

I don't mind submitting a PR for this as I wanted to add metadata writing in too (link rel="sitemap"...), just wanted to get your thoughts first.

Getting template URLs instead of URLs

When I run gulp-sitemap, I'm getting list of all my html pages in sitemap.xml. But these are of no use, what I want is proper URLs which are defined in my app.js against these html pages. How to get exact URLs which open in the browser through gulp-sitemap

Remove alternate tag if the href is undefined

Hi!! I've got multiple languages on my site and most of them would have their equivalent URL for the corresponding language. However, in some cases, the URL is only available in one or two languages.

Example:

<url>

        <loc>https://mydomain.com/legal/cookies-policy/</loc>


        <lastmod>2023-02-27T06:46:41.122Z</lastmod>
    
        <xhtml:link rel="alternate" hreflang="es" href="undefined" />
        <xhtml:link rel="alternate" hreflang="pt" href="undefined" />
        <xhtml:link rel="alternate" hreflang="fr" href="undefined" />
        <xhtml:link rel="alternate" hreflang="de" href="undefined" />
        <xhtml:link rel="alternate" hreflang="ru" href="undefined" />
    </url>

Is there any way of avoiding that?
Seeing the code, only pushing the tag if href is defined seems like a quick fix. I could submit a PR if needed.
Thanks.

How to ignore some files?

Is there a way to ignore some files?

I use email.html as an html email template for a simple php form, in the root of the project (same folder as index.html) and i don`t want that file to be added in the sitemap.

Using PHP files as index

Great job on this plugin I am loving how it works so far. The only issue I am running into is that I generally use php files instead of html, so when I am building the sitemap, I am getting domain.com/index.php instead of just domain.com. I have tried to use your regex pattern in the loc.replace function at first just trying to remove the .com, but I can't get it to just apply to the one url, it does it to all of them. I also tried to do it with mappings, but it is also not working. Here is my code:

.pipe(sitemap({
	siteUrl:siteUrl,
	mappings: [{
		pages: ['dev/index.php'],
		getLoc(siteUrl, loc, entry) {
			return loc.replace(/\.\w+$/, '');
		}
	}]
}))

Is there any support for php files in the works or is there a better way to accomplish this? Any help would be greatly appreciated.

hreflang support

Would be a nice new feature to generate optional also the hreflang tag into the sitemap. To keep it useable I assume same filename in all languages - only different pathes.

def-hreflangs {
"mydomain.com/ru/", "ru"
"mydomain.com/de/", "de"
"mydomain.com/en/", "en"
}

you look wich of them matches for the current file path, and all other hreflangs entries you write into the sitemap.

<url>
  <some other tags>
  <xhtml:link rel="alternate" hreflang="ru" href="http://www.doamin.ru/client/page1.html/"/>
  <xhtml:link rel="alternate" hreflang="en" href="http://www.doamin.en/client/page1.html/"/>
</url>

Suggestion: verbose option

Would be nice to have some informations about the task, at least the number of handled files.
As second level the processed file names. ... folder/filename

Thx, klaus

Enable support for lazy images

@pgilad I came across a very specific problem in the project I'm working on, I'm loading some images through lazy-load, so if you agree, I thought about creating a small interface to work with images, with this I can create a regex to collect the url of the informed attribute, something like:

images: {
    lazyLoadAttribute: 'data-src'
}

What do you think?

Suggestion: Make priority accept a function

Hi,
since my pages have different priority, it would be great to make the priority option accept a function as well (much like lastmod), something like:

priority: function (loc) {
    // Give pages inside root path (i.e. no slashes) a higher priority
    return loc.split('/').length === 0 ? 1 : 0.5;
}

That would imo be more convenient than what I am currently using:

priority: 0.6, // Default priority
mappings: [{
        pages: 'index.html', // Landing page
        priority: 1
    }, {
        pages: '*/index.html', // Other 'first level' index pages
        priority: 0.9
}]

Globbed negations ignored

Attempting to remove bower_components and node_modules file's .html files from the sitemap fails.

gulp.task('sitemap', function () {
    return gulp.src(['**/*.html', '!node_modules/', '!bower_components/'])
        .pipe(sitemap({
            siteUrl: 'https://scotthsmith.com',
            pages: ['!node_modules/', '!bower_components/']
        }))
        .pipe(gulp.dest('./'));
});

Both attempts at removing these source files from the output are unsuccessful.

.com removed

First off, thanks for the great plugin. I wanted to open this issue in case this is a minor bug.

The .com portion of my domain is also getting removed when using the remove extension example code. I'm not sure if I'm doing something wrong in my gulpfile.js or if this is a bug.

My setup is:

gulp.task('sitemap', function () {
  return gulp.src(['./dist/**/*.html','!./dist/exceptions/*','!./dist/google*.html'], {
    read: false
  })
    .pipe(plugins.sitemap({
      siteUrl: 'https://vanceauto.com',
      getLoc: function(siteUrl, loc, entry) {
          return loc.substr(0, loc.lastIndexOf('.')) || loc; // Removes the file extension
      }
  }))
    .pipe(gulp.dest('./dist/'));
});

The generated sitemap.xml looks like:

<url>
<loc>https://domain</loc>
<lastmod>2016-11-13T04:20:07.000Z</lastmod>
</url>
<url>
<loc>https://domain.com/inventory</loc>
<lastmod>2016-11-13T04:20:07.000Z</lastmod>
</url>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.