getSeoSitemap v5.0.0 | 2023-02-27

PHP library to get sitemap.
It crawls a whole domain checking all URLs.
It makes Search Engine Optimization of URLs into sitemap only.

Please support this project by making a donation via PayPal or via BTC bitcoin to the address 19928gKpqdyN6CHUh4Tae1GW9NAMT6SfQH

category Library
author Giovanni Bertone [email protected]
copyright 2017-2023 Giovanni Bertone | RED Racing Parts
link https://www.redracingparts.com
source https://github.com/johnbe4/getSeoSitemap

Warning

Before moving from releases lower than 4.1.1 to 4.1.1 or higher, you must drop getSeoSitemap and getSeoSitemapExec tables into your dBase.

Overview
This script creates a full gzip sitemap or multiple gzip sitemaps plus a gzip sitemap index.
It includes change frequency, last modification date and priority setted following your own rules.
Change frequency will be automatically selected between daily, weekly, monthly and yearly.
Max URL lenght must be 767 characters, otherwise the script will fail.
Max page size must be 16777215 bytes, otherwise the script will fail.
URLs with http response code different from 200 or with size = 0 will not be included into sitemap.
It checks all internal and external links inside html pages and js sources (href URLs into 'a' tag plus form action URLs if method is get).
It checks all internal and external sources.
Mailto URLs will not be included into sitemap.
URLs inside pdf files will not be scanned and will not be included into sitemap.

getSeoSitemapBot is a crawler like Googlebot and it does not exec javascript.
That means it does not follow URLs created by javascript.
On https://support.google.com/webmasters/answer/2409684?hl=en Google says:
".....
Some features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash can make it difficult for search engines to crawl your site.
Check the following:
Use a text browser such as Lynx to examine your site, since many search engines see your site much as Lynx would.
If features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.
....."

To improve SEO following robots.txt rules of "User-agent: *", it checks:

http response code of all internal and external sources into domain (images, scripts, links, iframes, videos, audios)
malformed URLs into domain
page title of URLs into domain
page description of URLs into domain
page h1/h2/h3 of URLs into domain
page size of URLs into sitemap
image alt of URLs into domain
image title of URLs into domain.

You can use absolute or relative URLs inside the site.
This script will set automatically all URLs to skip and to allow into sitemap following the robots.txt rules of "User-agent: *" and robots tag into page head.
There is not any automatic function to submit updated sitemap to search engines.
Sitemap will be saved in the main directory of the domain.
It rewrites robots.txt adding updated sitemap informations.
Maximum limit of URLs to insert into sitemap is 2.5T.

Other main features:

backup of all previous sitemaps into bak folder.
it repeats URL scan once after 5 sec in case of http response code is different from 200.
it prevents saving sitemap if total URLs percentage difference from previous successful exec is more than a preset value.

Using getSeoSitemap, you will be able to give a better surfing experience to your clients.

Requirements

PHP 8.0.
MariaDB 10.4.

Instructions
1 - copy getSeoSitemap folder in a protected zone of your server.
2 - set all user parameters into config.php.
3 - on your server cronotab schedule the script once each day preferable when your server is not too much busy.
A command line example to schedule the script every day at 7:45:00 AM is:
45 7 * * * php /example/example/example/example/example/getSeoSitemap/getSeoSitemap.php
When you know how long it takes to execute all the script, you could add a cronotab timeout.

Warning
Before moving from releases lower than 4.1.1 to 4.1.1 or higher, you must drop getSeoSitemap and getSeoSitemapExec tables into your dBase.
Do not save any file with name that starts with sitemap in the main directory, otherwise getSeoSitemap script could cancel it.
The robots.txt file must be present into the main directory of the site otherwise getSeoSitemap will fail.
In case of FPM timeout errors, you should fix setting pm.process_idle_timeout to 30s or higher.
To run getSeoSitemap faster, using a script like Geoplugin you should exclude geoSeoSitemapBot user-agent from that.

Fatal error: Uncaught Error: Call to a member function close()

Hello sir, first of all congratulations on this tool. Thanks <3
I can't get it to work on my server, I get this error.

Fatal error: Uncaught Error: Call to a member function close() on null in /home/lubipe/public_html/assets/getSeoSitemap/getSeoSitemap.php:355 Stack trace: #0 /home/lubipe/public_html/assets/getSeoSitemap/getSeoSitemap.php(687): getSeoSitemap->closeMysqliStmt() #1 /home/lubipe/public_html/assets/getSeoSitemap/getSeoSitemap.php(159): getSeoSitemap->end() #2 /home/lubipe/public_html/assets/getSeoSitemap/getSeoSitemap.php(2385): getSeoSitemap->start() #3 {main} thrown in /home/lubipe/public_html/assets/getSeoSitemap/getSeoSitemap.php on line 355

and log

[2021-08-02 13:18:34.131600] ## getSeoSitemap v4.1.0
[2021-08-02 13:18:34.132000] ## Execution start
[2021-08-02 13:18:34.140000] ## Scan start
[2021-08-02 13:18:34.142900] Renamed previous backup sitemap
[2021-08-02 13:18:34.143100] Saved backup sitemap
[2021-08-02 13:18:34.143300] Deleted previous backup sitemap
[2021-08-02 13:18:34.278400] ## Scan end

[2021-08-02 13:18:34.279200] Deleted old URLs
[2021-08-02 13:18:34.279900] 1 scanned URLs
[2021-08-02 13:18:34.280000] 0 URLs without title into domain (SEO: title should be present)
[2021-08-02 13:18:34.280100] 0 URLs with multiple title into domain (SEO: title should be single)
[2021-08-02 13:18:34.280300] 0 URLs without description into domain (SEO: description should be present)
[2021-08-02 13:18:34.280300] 0 URLs with multiple description into domain (SEO: description should be single)
[2021-08-02 13:18:34.280400] 0 URLs without h1 into domain (SEO: h1 should be present)
[2021-08-02 13:18:34.280500] 0 URLs with multiple h1 into domain (SEO: h1 should be single)
[2021-08-02 13:18:34.280500] 0 URLs without h2 into domain (SEO: h2 should be present)
[2021-08-02 13:18:34.280600] 0 URLs without h3 into domain (SEO: h3 should be present)

If I run again I get this other in the log
[2021-08-02 13:22:02.808900] An error has occoured: execution has been stopped; maybe the previous scan was not ended correctly. Double-check log to fix it. - fix it remembering to set exec to n in getSeoSitemapExec table.

johnbe4 / getseositemap Goto Github PK

getseositemap's Introduction

getSeoSitemap v5.0.0 | 2023-02-27

Warning

Before moving from releases lower than 4.1.1 to 4.1.1 or higher, you must drop getSeoSitemap and getSeoSitemapExec tables into your dBase.

getseositemap's People

Contributors

Stargazers

Watchers

Forkers

getseositemap's Issues

JS Page

Fatal error: Uncaught Error: Call to a member function close()

MySQL 5.7 Strict Mode Error

Skipped URL's

curl_exec failed calling URL #carousel #2

curl_exec failed calling URL #carousel

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent