alex-moreno / glitcherbot Goto Github PK

View Code? Open in Web Editor NEW

15.0 5.0 10.0 305 KB

Visual regression testing made easy. Automating the boring stuff

License: GNU Affero General Public License v3.0

PHP 79.97% Twig 17.89% CSS 0.12% JavaScript 0.30% Dockerfile 0.27% Shell 0.52% Makefile 0.94%

regression-testing regression-analysis regression-algorithms

glitcherbot's People

Contributors

Stargazers

Watchers

Forkers

iainp999 siliconmeadow joz-d7 marcelovani rpkoller vibhutisuneja005 reginahack tguilpain ashwinparmar briantully

glitcherbot's Issues

Run crawler inside Docker

At the moment Vagrant or installing the dependencies locally is the only possibility to run the crawler.

Ideally we'd have a way to run the crawler inside a Docker container, so dependencies are reduced as much as possible if you choose to use Docker instead of Vagrant.

Visualisations for historical data

Next level: now that we have some stored historical data, I wonder whether it would be worth looking at something like https://www.chartjs.org/ to put together some simple visualisations?

Or, we could get the data into grafana :)

Show a howto page instead of a broken one

On a new brand install, or when the db is empty as there has been no crawl yet most of the pages will display an error.

Instead would be great to show what the user needs to do to start to see things happening.

Highlighting of changes

When a status code or size changes between crawls, this should be highlighted in the table of results.

For the size, a threshold should be used, in terms of acceptable tolerances.

We can use bootstrap classes for the highlighting

table-danger if the size changes outside of tolerances or if the status code changes
table-warning if the size changes but is inside tolerances

Refactor echos

Move any echos on screen to use the -vvv

Add logging

Push any error from the process into a log file, somewhere in /etc/* maybe

PHP Errors and Warnings

Describe the bug

There are a few notices and warnings being produced by the tool when in use. For example:

Notice: Undefined index: date1 in src/ScraperBot/Routing/Controllers/SitesController.php on line 84
Notice: Undefined index: date2 in src/ScraperBot/Routing/Controllers/SitesController.php on line 84
Notice: Undefined variable: persistNaughty in src/ScraperBot/Routing/Controllers/SitesController.php on line 84
Notice: Undefined variable: persistLatest in src/ScraperBot/Routing/Controllers/SitesController.php on line 84

To Reproduce

Errors are produced on most pages of the site, but also when using the command line interface.

Error also produced by a missing favicon.ico file.

Expected behavior

No errors should be produced during the normal use of the tool.

Compare two environments with different base urls

At the moment the comparison between crawls is done using the whole url.

It would be useful to select specific crawls on which we want to compare just the final part of the url, so we can potentially have different environments, stage and prod, and compare results between the two.

Suggest an easier way to get up and running with the reporting tool

Whilst I think the inclusion of a full vagrant docker setup is a good idea, I think that providing a simpler way to get up and running might be a good thing and create a lower barrier for entry. I had a few troubles getting the vagrant commands to run correctly on my setup so I resorted to an alternative method that worked well.

Can I therefore suggest alternate instructions that allow the site to be served directly using the built in PHP server? All that's needed is the following command.

php -S 0.0.0.0:8000 -t html html/index.php

I was able to get the site up and running using this and view my statistics. This command can be wrapped in the composer scrips section of the composer file in order to allow the command to be easily run without having to remember the command. Adding this:

  "scripts": {
    "start" : "php -S 0.0.0.0:8000 -t public public/index.php",
  }

Allows the following to be run to .

composer start

This does assume that the user has all of the local dependencies installed, which are nicely detailed in the composer.json file :)

Feature request: specify sitemaps when executing crawl

Idea, execute directly:

php bin/visual_regression_bot.php bot:crawl-xml-sitemap sitemap.xml

Workflow for comparing two environments

Thank you for working on a non-js based regression tool. 👍

I am having trouble diffing two different environments, I am assuming the workflow is the following:

Create two sample CSV files - A CSV with production URLs and the other with URLs from your development environment where they are mapped like the following in each of the CSVs: www.production.com/test-page www.development.com/test-page
Run php bin/visual_regression_bot.php bot:crawl-sites production-site.csv
Run php bin/visual_regression_bot.php bot:crawl-sites development-site.csv
Run php bin/visual_regression_bot.php bot:compare-crawls production-site.csv development-site.csv
Go to http://localhost/ and click on "changed sites"

Add a plug-in system

If we use some plugin or api kind of mechanism we’d be able to trigger snapshots when for example we detect we have passed a threshold

Indexing sites in db

at the moment the sites are only kept in the csv or the json. When doing the first crawl we need to store those sites and indexes in a new table

Improve output

Potentially use of ncurses for the terminal, and clean the output using a log file maybe

Allow users to use the crawler on the browser

I think a nice feature would be to provide an UI similar to this https://validator.w3.org/ where the users can either paste a list of urls or upload a CSV or Json file to be processed.

Tasks:

1 - Provide an upload field to let users upload CSV or Json file.
2 - A textarea field to allow people to paste a list of urls
3 - Add validation of the data in the backend

Refactor code to abstract sources for urls

An abstract source interface, used by the crawler to fetch sites

sites.json
csv
database
sitemap

Capture errors in the database

When any errors happen would be good to log them on the database, so when drilling down on the site we can find what went wrong at the moment we were trying to fetch the database

Automatic Sitemap crawling

This is a TODO.

Ideally we should be able to provide as well a sitemap, so the user can simply give a main url and the bot can crawl everything inside that url

Menu items not working

Describe the bug
All the routes not working after taking the latest changes from the repository. I am using Vagrant based setup.

To Reproduce
Steps to reproduce the behavior:
0. I setup VirtualHost in apache with domain name dashboard.glitcherbot.local

sudo /etc/apache2/sites-available/dashboard.glitcherbot.local.conf

<VirtualHost *:80>
ServerAdmin [email protected]
ServerName dashboard.glitcherbot.local
ServerAlias www.dashboard.glitcherbot.local
DocumentRoot /var/www/html
DirectoryIndex index.php index.html

ErrorLog ${APACHE_LOG_DIR}/dashboard.glitcherbot.com-error.log
CustomLog ${APACHE_LOG_DIR}/dashboard.glitcherbot-access.log combined
</VirtualHost>

Apache rewrite module enabled sudo apache2ctl -M | grep rewrite && sudo a2enmod rewrite
Restart apache server sudo systemctl restart apache2.service
Go to 'http://dashboard.glitcherbot.local/ or https://192.168.33.10/'
Click on 'any menu item'
Route not working 'index.php' not working but it was already defined in 'html/.htaccess'

Expected behavior
It should accept index.php automatically to navigate all the menu items.

Screenshots
File: src/templates/menu.twig
Not working
<a class="nav-link" href="/sites">Diffs</a>

Working
<a class="nav-link" href="/index.php/sites">Diffs</a>

Desktop (please complete the following information):

OS: Mac
Browser chrome

Query specific status codes

When on diff page it would be good to get results for specific status codes. Say, I only want to see pages which return a 500 status

Sitemap crawl seems to be broken

Unless I am misunderstanding on how to use the the sitemap crawl using the bot:crawl-xml-sitemap command is failing as follows:

execute:
php bin/visual_regression_bot.php bot:crawl-xml-sitemap sitemap.xml

fails with:
Sitemaps crawling>>>> PHP Fatal error: Uncaught Error: Call to undefined method ScraperBot\Source\XmlSitemapSource::getCurrentIndex() in /Projects/glitcherbot/src/ScraperBot/Command/CrawlSitesCommand.php:83
Stack trace:
#0 /Projects/glitcherbot/vendor/symfony/console/Command/Command.php(256): ScraperBot\Command\CrawlSitesCommand->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#1 /Projects/glitcherbot/vendor/symfony/console/Application.php(971): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#2 /Projects/glitcherbot/vendor/symfony/console/Application.php(290): Symfony\Component\Console\Application->doRunCommand(Object(ScraperBot\Command\CrawlXmlSitemapCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Compo in /Projects/glitcherbot/src/ScraperBot/Command/CrawlSitesCommand.php on line 83