Giter VIP home page Giter VIP logo

Comments (9)

FFrozTT avatar FFrozTT commented on June 3, 2024 1

Oh strange, it didn't look like that when I was testing this a few days ago, at the time that page was loading normally in a browser.

In any case, it does appear to be working with that new link as I'm seeing a lot of results pour in now. Sorry for the false alarm.

from bathyscaphe.

creekorful avatar creekorful commented on June 3, 2024

Hello there! That's a error from me.
The scheduler is removing fragments and query parameters from the URL. For the query parameters that's an error to remove them since they may affect page content. This cleanup should definitely be removed.

from bathyscaphe.

creekorful avatar creekorful commented on June 3, 2024

I was wrong, this cleanup has been removed months ago.
Do you mind sharing the scheduler logs so that we can investigate?

from bathyscaphe.

FFrozTT avatar FFrozTT commented on June 3, 2024

Is there some other logs? All I have are these ones:

api_1            | time="2020-09-03T05:18:51Z" level=debug msg="Successfully published URL: http://xmh57jrzrnw6insl.onion/4a1f6b371c/search.cgi?s=DRP&q=irc&cmd=Search%21"
scheduler_1      | time="2020-09-03T05:18:51Z" level=debug msg="Processing URL: http://xmh57jrzrnw6insl.onion/4a1f6b371c/search.cgi?s=DRP&q=irc&cmd=Search%21"

Nothing ever hits the crawler

from bathyscaphe.

creekorful avatar creekorful commented on June 3, 2024

This looks strange, have you tried running ./scripts/log.sh scheduler ?

from bathyscaphe.

FFrozTT avatar FFrozTT commented on June 3, 2024

ya, that's where this one comes from:
scheduler_1 | time="2020-09-03T05:18:51Z" level=debug msg="Processing URL: http://xmh57jrzrnw6insl.onion/4a1f6b371c/search.cgi?s=DRP&q=irc&cmd=Search%21"

from bathyscaphe.

creekorful avatar creekorful commented on June 3, 2024

I need to check the DOM of the page, maybe there's something in there that's causing trouble to the crawler.

from bathyscaphe.

creekorful avatar creekorful commented on June 3, 2024
 $ curl --socks5-hostname localhost:9050 'http://xmh57jrzrnw6insl.onion/4a1f6b371c/search.cgi?s=DRP&q=irc&cmd=Search%21'
[...]
<h2>This domain has been migrated to Onion version 3.</h2>
[...]
From now on, to access <b>Torch: Tor Search Engine</b> service you must use this Onion domain name:<br><br>
     <a href="http://xmh57jrknzkhv6y3ls3ubitzfqnkrwxhopf5aygthi7d6rplyvk3noyd.onion"><h3>xmh57jrknzkhv6y3ls3ubitzfqnkrwxhopf5aygthi7d6rplyvk3noyd.onion</h3></a><br><br>
[...]

Looks like these links are not valid anymore. The server is returning a HTTP 404 (no redirection) so there's no way for the crawler to find the page. Maybe try to crawl again with the updated torch URL?

from bathyscaphe.

creekorful avatar creekorful commented on June 3, 2024

No problem at all!

BTW I'm a bit curious, why do you exactly give Trandoshan a try? Personal project? Company? Student? Just for fun?
I'm happy to know that people are still interested in it :)

If you are up to, let's talk about this by whatever communication support you want to use.
Full list of my communication support is available here: https://creekorful.dev

from bathyscaphe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.