Giter VIP home page Giter VIP logo

getseositemap's Introduction

getSeoSitemap v5.0.0 | 2023-02-27

PHP library to get sitemap.
It crawls a whole domain checking all URLs.
It makes Search Engine Optimization of URLs into sitemap only.

donate via paypal
donate via bitcoin
Please support this project by making a donation via PayPal or via BTC bitcoin to the address 19928gKpqdyN6CHUh4Tae1GW9NAMT6SfQH

Warning

Before moving from releases lower than 4.1.1 to 4.1.1 or higher, you must drop getSeoSitemap and getSeoSitemapExec tables into your dBase.

Overview
This script creates a full gzip sitemap or multiple gzip sitemaps plus a gzip sitemap index.
It includes change frequency, last modification date and priority setted following your own rules.
Change frequency will be automatically selected between daily, weekly, monthly and yearly.
Max URL lenght must be 767 characters, otherwise the script will fail.
Max page size must be 16777215 bytes, otherwise the script will fail.
URLs with http response code different from 200 or with size = 0 will not be included into sitemap.
It checks all internal and external links inside html pages and js sources (href URLs into 'a' tag plus form action URLs if method is get).
It checks all internal and external sources.
Mailto URLs will not be included into sitemap.
URLs inside pdf files will not be scanned and will not be included into sitemap.

getSeoSitemapBot is a crawler like Googlebot and it does not exec javascript.
That means it does not follow URLs created by javascript.
On https://support.google.com/webmasters/answer/2409684?hl=en Google says:
".....
Some features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash can make it difficult for search engines to crawl your site.
Check the following:
Use a text browser such as Lynx to examine your site, since many search engines see your site much as Lynx would.
If features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.
....."

To improve SEO following robots.txt rules of "User-agent: *", it checks:

  • http response code of all internal and external sources into domain (images, scripts, links, iframes, videos, audios)
  • malformed URLs into domain
  • page title of URLs into domain
  • page description of URLs into domain
  • page h1/h2/h3 of URLs into domain
  • page size of URLs into sitemap
  • image alt of URLs into domain
  • image title of URLs into domain.

You can use absolute or relative URLs inside the site.
This script will set automatically all URLs to skip and to allow into sitemap following the robots.txt rules of "User-agent: *" and robots tag into page head.
There is not any automatic function to submit updated sitemap to search engines.
Sitemap will be saved in the main directory of the domain.
It rewrites robots.txt adding updated sitemap informations.
Maximum limit of URLs to insert into sitemap is 2.5T.

Other main features:

  • backup of all previous sitemaps into bak folder.
  • it repeats URL scan once after 5 sec in case of http response code is different from 200.
  • it prevents saving sitemap if total URLs percentage difference from previous successful exec is more than a preset value.

Using getSeoSitemap, you will be able to give a better surfing experience to your clients.

Requirements

  • PHP 8.0.
  • MariaDB 10.4.

Instructions
1 - copy getSeoSitemap folder in a protected zone of your server.
2 - set all user parameters into config.php.
3 - on your server cronotab schedule the script once each day preferable when your server is not too much busy.
A command line example to schedule the script every day at 7:45:00 AM is:
45 7 * * * php /example/example/example/example/example/getSeoSitemap/getSeoSitemap.php
When you know how long it takes to execute all the script, you could add a cronotab timeout.

Warning
Before moving from releases lower than 4.1.1 to 4.1.1 or higher, you must drop getSeoSitemap and getSeoSitemapExec tables into your dBase.
Do not save any file with name that starts with sitemap in the main directory, otherwise getSeoSitemap script could cancel it.
The robots.txt file must be present into the main directory of the site otherwise getSeoSitemap will fail.
In case of FPM timeout errors, you should fix setting pm.process_idle_timeout to 30s or higher.
To run getSeoSitemap faster, using a script like Geoplugin you should exclude geoSeoSitemapBot user-agent from that.

getseositemap's People

Contributors

johnbe4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

getseositemap's Issues

Fatal error: Uncaught Error: Call to a member function close()

Hello sir, first of all congratulations on this tool. Thanks <3
I can't get it to work on my server, I get this error.

Fatal error: Uncaught Error: Call to a member function close() on null in /home/lubipe/public_html/assets/getSeoSitemap/getSeoSitemap.php:355 Stack trace: #0 /home/lubipe/public_html/assets/getSeoSitemap/getSeoSitemap.php(687): getSeoSitemap->closeMysqliStmt() #1 /home/lubipe/public_html/assets/getSeoSitemap/getSeoSitemap.php(159): getSeoSitemap->end() #2 /home/lubipe/public_html/assets/getSeoSitemap/getSeoSitemap.php(2385): getSeoSitemap->start() #3 {main} thrown in /home/lubipe/public_html/assets/getSeoSitemap/getSeoSitemap.php on line 355

and log

[2021-08-02 13:18:34.131600] ## getSeoSitemap v4.1.0
[2021-08-02 13:18:34.132000] ## Execution start
[2021-08-02 13:18:34.140000] ## Scan start
[2021-08-02 13:18:34.142900] Renamed previous backup sitemap
[2021-08-02 13:18:34.143100] Saved backup sitemap
[2021-08-02 13:18:34.143300] Deleted previous backup sitemap
[2021-08-02 13:18:34.278400] ## Scan end

[2021-08-02 13:18:34.279200] Deleted old URLs
[2021-08-02 13:18:34.279900] 1 scanned URLs
[2021-08-02 13:18:34.280000] 0 URLs without title into domain (SEO: title should be present)
[2021-08-02 13:18:34.280100] 0 URLs with multiple title into domain (SEO: title should be single)
[2021-08-02 13:18:34.280300] 0 URLs without description into domain (SEO: description should be present)
[2021-08-02 13:18:34.280300] 0 URLs with multiple description into domain (SEO: description should be single)
[2021-08-02 13:18:34.280400] 0 URLs without h1 into domain (SEO: h1 should be present)
[2021-08-02 13:18:34.280500] 0 URLs with multiple h1 into domain (SEO: h1 should be single)
[2021-08-02 13:18:34.280500] 0 URLs without h2 into domain (SEO: h2 should be present)
[2021-08-02 13:18:34.280600] 0 URLs without h3 into domain (SEO: h3 should be present)

If I run again I get this other in the log
[2021-08-02 13:22:02.808900] An error has occoured: execution has been stopped; maybe the previous scan was not ended correctly. Double-check log to fix it. - fix it remembering to set exec to n in getSeoSitemapExec table.

MySQL 5.7 Strict Mode Error

Hello,

I am seeing database errors with MySQL 5.7. I've set the default value of all the fields to NULL.

How to fix this error? Thanks.

2019-07-05 00:43:43 - Execution has been stopped because of MySQL execute error: Data truncated for column 'changefreq' at row 1.

Skipped URL's

Hello,

I indexed a page and all the URLS are skipped. The page contains absolute URLs. The log lists out all the skipped URLs, the database shows the skipped URLs with a 200 response code. I've set the $skipUrl to some random characters. What could be the reason.

Here's the URL

https://www.hobotraveler.com/videos/

curl_exec failed calling URL #carousel

2017-11-25 03:32:47 - ## Execution start
2017-11-25 03:32:47 - ## Scan start
2017-11-25 03:32:47 - ## Scan end
2017-11-25 03:32:47 - 1 scanned URLs (skipped URLs are not included - failed URls are included)

2017-11-25 03:32:52 - Execution has been stopped because of curl_exec failed calling URL #carousel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.