Giter VIP home page Giter VIP logo

Comments (3)

edoardottt avatar edoardottt commented on August 17, 2024

Thank you so much @cyb3rjerry for your contribution, really appreciated.

Actually you are right, it should not happen.
Investigating a bit it seems like that 99% of the URL that should not pass are actually ignored, but a small part of them still get printed on cli. This is because the method used to print URLs on CLI is:

c.OnResponse(func(r *colly.Response) {
		fmt.Println(r.Request.URL.String())
...

This prints also the ignored URLs in case of a long chain of redirects where the last call is made by a ignored URL.
Imagine something like this:

  • Link1 redirects to Link2
  • Link2 redirects to Link3
  • ...
  • LinkToIgnore redirects to LinkN

This behavior makes cariddi printing LinkToIgnore.
I don't have a solution for this, if you want to propose something I'm all ears.

For now you can rely on output files, I'm quite sure they don't include to-be-ignored URLs.

from cariddi.

cyb3rjerry avatar cyb3rjerry commented on August 17, 2024

Sounds good! I'll thinker on this and get back to you at some point :)

Thanks again for the great tool btw!

from cariddi.

edoardottt avatar edoardottt commented on August 17, 2024

Closed by #82

from cariddi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.