Giter VIP home page Giter VIP logo

glitcherbot's People

Contributors

alex-moreno avatar ashwinparmar avatar iainp999 avatar rpkoller avatar tguilpain avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

glitcherbot's Issues

Run crawler inside Docker

At the moment Vagrant or installing the dependencies locally is the only possibility to run the crawler.

Ideally we'd have a way to run the crawler inside a Docker container, so dependencies are reduced as much as possible if you choose to use Docker instead of Vagrant.

Show a howto page instead of a broken one

On a new brand install, or when the db is empty as there has been no crawl yet most of the pages will display an error.

Instead would be great to show what the user needs to do to start to see things happening.

Highlighting of changes

When a status code or size changes between crawls, this should be highlighted in the table of results.

For the size, a threshold should be used, in terms of acceptable tolerances.

We can use bootstrap classes for the highlighting

  • table-danger if the size changes outside of tolerances or if the status code changes
  • table-warning if the size changes but is inside tolerances

Add logging

Push any error from the process into a log file, somewhere in /etc/* maybe

PHP Errors and Warnings

Describe the bug

There are a few notices and warnings being produced by the tool when in use. For example:

Notice: Undefined index: date1 in src/ScraperBot/Routing/Controllers/SitesController.php on line 84
Notice: Undefined index: date2 in src/ScraperBot/Routing/Controllers/SitesController.php on line 84
Notice: Undefined variable: persistNaughty in src/ScraperBot/Routing/Controllers/SitesController.php on line 84
Notice: Undefined variable: persistLatest in src/ScraperBot/Routing/Controllers/SitesController.php on line 84

To Reproduce

Errors are produced on most pages of the site, but also when using the command line interface.

Error also produced by a missing favicon.ico file.

Expected behavior

No errors should be produced during the normal use of the tool.

Compare two environments with different base urls

At the moment the comparison between crawls is done using the whole url.

It would be useful to select specific crawls on which we want to compare just the final part of the url, so we can potentially have different environments, stage and prod, and compare results between the two.

Suggest an easier way to get up and running with the reporting tool

Whilst I think the inclusion of a full vagrant docker setup is a good idea, I think that providing a simpler way to get up and running might be a good thing and create a lower barrier for entry. I had a few troubles getting the vagrant commands to run correctly on my setup so I resorted to an alternative method that worked well.

Can I therefore suggest alternate instructions that allow the site to be served directly using the built in PHP server? All that's needed is the following command.

php -S 0.0.0.0:8000 -t html html/index.php

I was able to get the site up and running using this and view my statistics. This command can be wrapped in the composer scrips section of the composer file in order to allow the command to be easily run without having to remember the command. Adding this:

  "scripts": {
    "start" : "php -S 0.0.0.0:8000 -t public public/index.php",
  }

Allows the following to be run to .

composer start

This does assume that the user has all of the local dependencies installed, which are nicely detailed in the composer.json file :)

Workflow for comparing two environments

Thank you for working on a non-js based regression tool. ๐Ÿ‘

I am having trouble diffing two different environments, I am assuming the workflow is the following:

  1. Create two sample CSV files - A CSV with production URLs and the other with URLs from your development environment where they are mapped like the following in each of the CSVs: www.production.com/test-page www.development.com/test-page
  2. Run php bin/visual_regression_bot.php bot:crawl-sites production-site.csv
  3. Run php bin/visual_regression_bot.php bot:crawl-sites development-site.csv
  4. Run php bin/visual_regression_bot.php bot:compare-crawls production-site.csv development-site.csv
  5. Go to http://localhost/ and click on "changed sites"

Add a plug-in system

If we use some plugin or api kind of mechanism weโ€™d be able to trigger snapshots when for example we detect we have passed a threshold

Indexing sites in db

at the moment the sites are only kept in the csv or the json. When doing the first crawl we need to store those sites and indexes in a new table

Improve output

Potentially use of ncurses for the terminal, and clean the output using a log file maybe

Allow users to use the crawler on the browser

I think a nice feature would be to provide an UI similar to this https://validator.w3.org/ where the users can either paste a list of urls or upload a CSV or Json file to be processed.

Tasks:

1 - Provide an upload field to let users upload CSV or Json file.
2 - A textarea field to allow people to paste a list of urls
3 - Add validation of the data in the backend

Capture errors in the database

When any errors happen would be good to log them on the database, so when drilling down on the site we can find what went wrong at the moment we were trying to fetch the database

Automatic Sitemap crawling

This is a TODO.

Ideally we should be able to provide as well a sitemap, so the user can simply give a main url and the bot can crawl everything inside that url

Menu items not working

Describe the bug
All the routes not working after taking the latest changes from the repository. I am using Vagrant based setup.

To Reproduce
Steps to reproduce the behavior:
0. I setup VirtualHost in apache with domain name dashboard.glitcherbot.local

sudo /etc/apache2/sites-available/dashboard.glitcherbot.local.conf

<VirtualHost *:80>
ServerAdmin [email protected]
ServerName dashboard.glitcherbot.local
ServerAlias www.dashboard.glitcherbot.local
DocumentRoot /var/www/html
DirectoryIndex index.php index.html

ErrorLog ${APACHE_LOG_DIR}/dashboard.glitcherbot.com-error.log
CustomLog ${APACHE_LOG_DIR}/dashboard.glitcherbot-access.log combined
</VirtualHost>

  1. Apache rewrite module enabled sudo apache2ctl -M | grep rewrite && sudo a2enmod rewrite
  2. Restart apache server sudo systemctl restart apache2.service
  3. Go to 'http://dashboard.glitcherbot.local/ or https://192.168.33.10/'
  4. Click on 'any menu item'
  5. Route not working 'index.php' not working but it was already defined in 'html/.htaccess'

Expected behavior
It should accept index.php automatically to navigate all the menu items.

Screenshots
File: src/templates/menu.twig
Not working
<a class="nav-link" href="/sites">Diffs</a>

Working
<a class="nav-link" href="/index.php/sites">Diffs</a>

Desktop (please complete the following information):

  • OS: Mac
  • Browser chrome

Query specific status codes

When on diff page it would be good to get results for specific status codes. Say, I only want to see pages which return a 500 status

Sitemap crawl seems to be broken

Unless I am misunderstanding on how to use the the sitemap crawl using the bot:crawl-xml-sitemap command is failing as follows:

execute:
php bin/visual_regression_bot.php bot:crawl-xml-sitemap sitemap.xml

fails with:
Sitemaps crawling>>>> PHP Fatal error: Uncaught Error: Call to undefined method ScraperBot\Source\XmlSitemapSource::getCurrentIndex() in /Projects/glitcherbot/src/ScraperBot/Command/CrawlSitesCommand.php:83
Stack trace:
#0 /Projects/glitcherbot/vendor/symfony/console/Command/Command.php(256): ScraperBot\Command\CrawlSitesCommand->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#1 /Projects/glitcherbot/vendor/symfony/console/Application.php(971): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#2 /Projects/glitcherbot/vendor/symfony/console/Application.php(290): Symfony\Component\Console\Application->doRunCommand(Object(ScraperBot\Command\CrawlXmlSitemapCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Compo in /Projects/glitcherbot/src/ScraperBot/Command/CrawlSitesCommand.php on line 83

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.