Giter VIP home page Giter VIP logo

Comments (12)

Skallwar avatar Skallwar commented on June 12, 2024 2

I understand what's going on, thanks.

My problem now is to try to have some kind of test that I can run in order to test a potential fix.

I will try to implement something similar in our test suite when I have some time. Feel free to open a pull request if you want to try and fix it by yourself, I would gladly review and merge it

from suckit.

Skallwar avatar Skallwar commented on June 12, 2024 1

Thanks for reporting this

from suckit.

xavetar avatar xavetar commented on June 12, 2024 1

Could you provide the command that triggered this behavior?

This behavior is caused not by a command, but by a resource that does not want to be parsed. This resource is the most dirty, it is full of black holes. If you use less than 100, it will get stuck altogether and then throw an exception. You can try 10 to start, not immediately 100. In order to see it clearly, I recommend disabling this random-range and tries.

./suckit -vc -j 100 --include-download ".pdf|.ps.Z" --exclude-download ".png|.jpg|.mp4" --random-range 10 --tries 3 -u "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36 Edg/91.0.864.37" "https://www.cl.cam.ac.uk/"

from suckit.

Skallwar avatar Skallwar commented on June 12, 2024 1

It seems that adding a timeout would be quite easy to do with the reqwest crate we are using. However I'm not entirely sure what to do when we try to download a huge file like 1G or something.

If I set a timeout of 30s it will solve this issue but might also break the download of a huge file depending on your connection speed. If I put say 30 minutes as a timeout value, then I might get away with the huge file case but your will also get threads stuck for much longer...

Reqwest have a test for this and it seems to me that they just throw a timeout error

from suckit.

xavetar avatar xavetar commented on June 12, 2024

Thanks for reporting this

And from the wishes, there is not enough --timeout option only --tries. Some web servers, when parsing content, do not give out content or it is missing at the endpoint and as a result they are loaded endlessly (Connection Black Hole) until the connection is broken and the thread itself also dies at that moment.

from suckit.

Skallwar avatar Skallwar commented on June 12, 2024

Could you provide the command that triggered this behavior?

from suckit.

Skallwar avatar Skallwar commented on June 12, 2024

Can you pinpoint one link that hangs for ever. I'm running this for a few hours without issue

from suckit.

xavetar avatar xavetar commented on June 12, 2024

Can you pinpoint one link that hangs for ever. I'm running this for a few hours without issue

I recorded only two endpoints that had problems with infinite loading, including the browser, at the moment they are throwing a timeout or an exception.

  1. https://www.cl.cam.ac.uk/~aas23/papers_aas/PoolSDA2.ps
  2. https://www.cl.cam.ac.uk/~aas23/papers_aas/fair4.pdf

Perhaps now these errors do not exist due to the fact that they began to close access to these files on their resource. If you don't have these downloaded endpoints in the root directory, then it's like this:

2

from suckit.

Skallwar avatar Skallwar commented on June 12, 2024

When trying to download both of them directly I get this error:

tcp connect error: No route to host (os error 113)

I don't notice any "stuck" thread

from suckit.

xavetar avatar xavetar commented on June 12, 2024

When trying to download both of them directly I get this error:

tcp connect error: No route to host (os error 113)

I don't notice any "stuck" thread

Congratulations, I repeat, you don't have a disconnect timeout from the server when the server doesn't tell the client that a disconnect has occurred.

PREVIOUSLY, when trying to access content from these links from a browser, the page would load for about a minute and then the client would time out, not a server. NOW the server timeout or connection fails.

ABOUT 10 HOURS AGO, I couldn't access any of the endpoints posted above. NOW some of them are available.

Threads always fall and it had nothing to do with this resource. This resource only enhanced this action by such processing of requests.

from suckit.

xavetar avatar xavetar commented on June 12, 2024

I will try to implement something similar in our test suite when I have some time. Feel free to open a pull request if you want to try and fix it by yourself, I would gladly review and merge it

I would join, but right now I don't have this language in my stack. This will change soon.

I think adding only the client timeout would greatly reduce the chance of a threading problem, but not completely eliminate it.

from suckit.

xavetar avatar xavetar commented on June 12, 2024

It seems that adding a timeout would be quite easy to do with the reqwest crate we are using. However I'm not entirely sure what to do when we try to download a huge file like 1G or something.

If I set a timeout of 30s it will solve this issue but might also break the download of a huge file depending on your connection speed. If I put say 30 minutes as a timeout value, then I might get away with the huge file case but your will also get threads stuck for much longer...

Reqwest have a test for this and it seems to me that they just throw a timeout error

The problem is in the reqwest code. The timeout should work out before the connection is established, they made it global, it works for them both after the initialization and at the time of data transfer. In fact, the timeout should work until the server sends a new packet.

With so many dependencies, I'm actually surprised it works. The feeling that the author decided to make a salad out of everything in the world. Now I'm thinking of writing my own client, because the implementation of the rest doesn't sympathize with me too much. Perhaps there are other clients that work well. The problem is the dependencies itself, and one mistake spawns hundreds of others.

And can someone explain to me why add direct work with JSON to the library for working with requests. It feels like javascripters have taken over the world. And this is only part of everything, the principle of divide and conquer apparently does not work for them.

That is simple illustration (uncomment signatures and other lines):

use std::io::prelude::*;
use std::fs::{File, remove_file};
use std::path::Path;
use reqwest::Client;
// use std::time::Duration;

// async fn download_file_with_timeout(url: &str, timeout: Duration, file_path: &str) -> Result<(), reqwest::Error> {
async fn download_file(url: &str, file_path: &str) -> Result<(), reqwest::Error> {

    match Path::new(file_path).try_exists() {
        Ok(_value) => {
            let status_remove = remove_file(file_path);
            match status_remove {
                Ok(_value) => println!("File removed!"),
                Err(_value) => println!("File is not removed!")
            }
        },
        Err(_error) => {
            println!("File is not exists");
        }
    } 

    let output_file_result = File::create(file_path);

    let mut output_file = match output_file_result {
        Ok(value) => {
            println!("File is created!");
            value
        }
        Err(error) => {
            panic!("Can't create the file: {:?}", error);
        }
    };

    let client = Client::builder().build()?;
    // let client = Client::builder().timeout(timeout).build()?;
    let mut response = client.get(url).send().await?;

    while let Some(chunk) = response.chunk().await? {
        output_file.write_all(&chunk).unwrap();
    }

    Ok(())
}

#[tokio::main]
async fn main() {
    let url = "https://speed.hetzner.de/10GB.bin";
    // let timeout = Duration::from_secs(30);

    let file_path = "/Users/$USER$/Downloads/10GB.bin";

    // download_file_with_timeout(url, timeout, file_path).await;
    download_file_with_timeout(url, file_path).await;
}

from suckit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.