Comments (12)
I understand what's going on, thanks.
My problem now is to try to have some kind of test that I can run in order to test a potential fix.
I will try to implement something similar in our test suite when I have some time. Feel free to open a pull request if you want to try and fix it by yourself, I would gladly review and merge it
from suckit.
Thanks for reporting this
from suckit.
Could you provide the command that triggered this behavior?
This behavior is caused not by a command, but by a resource that does not want to be parsed. This resource is the most dirty, it is full of black holes. If you use less than 100, it will get stuck altogether and then throw an exception. You can try 10 to start, not immediately 100. In order to see it clearly, I recommend disabling this random-range and tries.
./suckit -vc -j 100 --include-download ".pdf|.ps.Z" --exclude-download ".png|.jpg|.mp4" --random-range 10 --tries 3 -u "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36 Edg/91.0.864.37" "https://www.cl.cam.ac.uk/"
from suckit.
It seems that adding a timeout would be quite easy to do with the reqwest crate we are using. However I'm not entirely sure what to do when we try to download a huge file like 1G or something.
If I set a timeout of 30s it will solve this issue but might also break the download of a huge file depending on your connection speed. If I put say 30 minutes as a timeout value, then I might get away with the huge file case but your will also get threads stuck for much longer...
Reqwest have a test for this and it seems to me that they just throw a timeout error
from suckit.
Thanks for reporting this
And from the wishes, there is not enough --timeout option only --tries. Some web servers, when parsing content, do not give out content or it is missing at the endpoint and as a result they are loaded endlessly (Connection Black Hole) until the connection is broken and the thread itself also dies at that moment.
from suckit.
Could you provide the command that triggered this behavior?
from suckit.
Can you pinpoint one link that hangs for ever. I'm running this for a few hours without issue
from suckit.
Can you pinpoint one link that hangs for ever. I'm running this for a few hours without issue
I recorded only two endpoints that had problems with infinite loading, including the browser, at the moment they are throwing a timeout or an exception.
https://www.cl.cam.ac.uk/~aas23/papers_aas/PoolSDA2.ps
https://www.cl.cam.ac.uk/~aas23/papers_aas/fair4.pdf
Perhaps now these errors do not exist due to the fact that they began to close access to these files on their resource. If you don't have these downloaded endpoints in the root directory, then it's like this:
from suckit.
When trying to download both of them directly I get this error:
tcp connect error: No route to host (os error 113)
I don't notice any "stuck" thread
from suckit.
When trying to download both of them directly I get this error:
tcp connect error: No route to host (os error 113)
I don't notice any "stuck" thread
Congratulations, I repeat, you don't have a disconnect timeout from the server when the server doesn't tell the client that a disconnect has occurred.
PREVIOUSLY, when trying to access content from these links from a browser, the page would load for about a minute and then the client would time out, not a server. NOW the server timeout or connection fails.
ABOUT 10 HOURS AGO, I couldn't access any of the endpoints posted above. NOW some of them are available.
Threads always fall and it had nothing to do with this resource. This resource only enhanced this action by such processing of requests.
from suckit.
I will try to implement something similar in our test suite when I have some time. Feel free to open a pull request if you want to try and fix it by yourself, I would gladly review and merge it
I would join, but right now I don't have this language in my stack. This will change soon.
I think adding only the client timeout would greatly reduce the chance of a threading problem, but not completely eliminate it.
from suckit.
It seems that adding a timeout would be quite easy to do with the reqwest crate we are using. However I'm not entirely sure what to do when we try to download a huge file like 1G or something.
If I set a timeout of 30s it will solve this issue but might also break the download of a huge file depending on your connection speed. If I put say 30 minutes as a timeout value, then I might get away with the huge file case but your will also get threads stuck for much longer...
Reqwest have a test for this and it seems to me that they just throw a timeout error
The problem is in the reqwest code. The timeout should work out before the connection is established, they made it global, it works for them both after the initialization and at the time of data transfer. In fact, the timeout should work until the server sends a new packet.
With so many dependencies, I'm actually surprised it works. The feeling that the author decided to make a salad out of everything in the world. Now I'm thinking of writing my own client, because the implementation of the rest doesn't sympathize with me too much. Perhaps there are other clients that work well. The problem is the dependencies itself, and one mistake spawns hundreds of others.
And can someone explain to me why add direct work with JSON to the library for working with requests. It feels like javascripters have taken over the world. And this is only part of everything, the principle of divide and conquer apparently does not work for them.
That is simple illustration (uncomment signatures and other lines):
use std::io::prelude::*;
use std::fs::{File, remove_file};
use std::path::Path;
use reqwest::Client;
// use std::time::Duration;
// async fn download_file_with_timeout(url: &str, timeout: Duration, file_path: &str) -> Result<(), reqwest::Error> {
async fn download_file(url: &str, file_path: &str) -> Result<(), reqwest::Error> {
match Path::new(file_path).try_exists() {
Ok(_value) => {
let status_remove = remove_file(file_path);
match status_remove {
Ok(_value) => println!("File removed!"),
Err(_value) => println!("File is not removed!")
}
},
Err(_error) => {
println!("File is not exists");
}
}
let output_file_result = File::create(file_path);
let mut output_file = match output_file_result {
Ok(value) => {
println!("File is created!");
value
}
Err(error) => {
panic!("Can't create the file: {:?}", error);
}
};
let client = Client::builder().build()?;
// let client = Client::builder().timeout(timeout).build()?;
let mut response = client.get(url).send().await?;
while let Some(chunk) = response.chunk().await? {
output_file.write_all(&chunk).unwrap();
}
Ok(())
}
#[tokio::main]
async fn main() {
let url = "https://speed.hetzner.de/10GB.bin";
// let timeout = Duration::from_secs(30);
let file_path = "/Users/$USER$/Downloads/10GB.bin";
// download_file_with_timeout(url, timeout, file_path).await;
download_file_with_timeout(url, file_path).await;
}
from suckit.
Related Issues (20)
- Fonts download support
- Quoting issue on charset detection HOT 3
- Unicode handling of --include and --exclude HOT 8
- Give tl a try HOT 1
- Solved: error: linkr 'cc' not found during install HOT 2
- Proxy support HOT 8
- Panic when folder path with dot serves a webpage HOT 3
- Incorrect local URLs on an index_no_slash.html HOT 2
- Failure in name resolution on books.toscrape.com HOT 1
- Fix release cross compilation CI HOT 1
- Only download certain filetypes HOT 1
- Crash with v0.2.0 HOT 3
- Build for riscv64
- Exclude already downloaded file HOT 1
- Make URL Processing optional HOT 3
- Add URLs to depth tree from CSS HOT 4
- Create issue template
- Resume download for large websites HOT 2
- Moving currently downloaded files and folders HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from suckit.