Giter VIP home page Giter VIP logo

Comments (16)

isaacna avatar isaacna commented on June 12, 2024

The error happened again, and this time we had the log statement for some more details:
https://github.com/CouncilDataProject/test-deployment/runs/3854359377?check_suite_focus=true#step:8:433

Based on the error message, this means that the validator method really did return False

Since the validation that takes place here is checking whether the resource exists, it should be that fsspec just isn't finding the resource. I have no idea why this would happen unless it's some transient type of error.

Will debug further, probably by looking into fsspec

from cdp-backend.

evamaxfield avatar evamaxfield commented on June 12, 2024

It honestly may be just that we are storing that file and then instantly checking if it exists. I think most storage systems are "eventually consistent" / it may take some time for Google to properly hit the result after storing. (Especially when we are uploading hundreds of files at a time...)

from cdp-backend.

evamaxfield avatar evamaxfield commented on June 12, 2024

Happened on the new seattle-staging deployment too: https://github.com/CouncilDataProject/seattle-staging/runs/3873780139?check_suite_focus=true#step:9:305

from cdp-backend.

evamaxfield avatar evamaxfield commented on June 12, 2024

@isaacna I think this may also be timeout errors or similar: fsspec/filesystem_spec#619 (comment)

I am seeing a lot of failures on seattle-staging backfill runs because of timeout errors: https://github.com/CouncilDataProject/seattle-staging/runs/3875382085?check_suite_focus=true#step:9:157

from cdp-backend.

isaacna avatar isaacna commented on June 12, 2024

It honestly may be just that we are storing that file and then instantly checking if it exists. I think most storage systems are "eventually consistent" / it may take some time for Google to properly hit the result after storing. (Especially when we are uploading hundreds of files at a time...)

I'm kinda confused about this, but for this specific case isn't it checking that the remote resource https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf exists via an http request? Or am I misunderstanding what fsspec.url_to_fs does?

I was thinking that sometimes there's just an error while making the http request (for the non google uri's): https://github.com/intake/filesystem_spec/blob/master/fsspec/implementations/http.py#L305-L315

Also for the timeout errors during resource_copy I dont' think that's related to the validator issue but still a problem. Maybe we could add some retry logic into resource_copy?

from cdp-backend.

isaacna avatar isaacna commented on June 12, 2024

I also created a script that just checks creates a file store using fsspec, and checks whether the resource exists. I ran it in a loop but haven't seen a case with the resource suddenly not being found. Maybe the github action runner is just more prone to http errors or timeouts?

This may be going too deep specifically into this issue but HttpFileSystem exists uses a GET request, but according to this HEAD is more efficient. Maybe we could try that if it's intermittent http errors giving us issues?

from cdp-backend.

isaacna avatar isaacna commented on June 12, 2024

For the FSTimeoutError the vids we're trying download in the run you sent earlier are pretty large, all of them being 2 hrs+, with one being over 3 hours.

The ones we use in the test deployment data tend to be much shorter, with only one of them being close to 3 hours and most being under 30 minutes: https://github.com/CouncilDataProject/cdp-backend/blob/main/cdp_backend/pipeline/mock_get_events.py#L54-L69

from cdp-backend.

evamaxfield avatar evamaxfield commented on June 12, 2024

I also created a script that just checks creates a file store using fsspec, and checks whether the resource exists. I ran it in a loop but haven't seen a case with the resource suddenly not being found. Maybe the github action runner is just more prone to http errors or timeouts?

This may be going too deep specifically into this issue but HttpFileSystem exists uses a GET request, but according to this HEAD is more efficient. Maybe we could try that if it's intermittent http errors giving us issues?

Oh interesting... Making a HEAD request for .get on HttpFileSystem seems smart.

For the FSTimeoutError the vids we're trying download in the run you sent earlier are pretty large, all of them being 2 hrs+, with one being over 3 hours.

Yea those are likely real timeouts then. In which case we should add the timeout={some_int}. And btw we already have a retry=3 on the task.

from cdp-backend.

isaacna avatar isaacna commented on June 12, 2024

It looks like Event gather runs 160-164 (the ones since bumping the cdp-backend version) haven't encountered the could not be archived issue. I'll wait for a few more runs before saying that the HEAD request fixed the issue, but fingers crossed that it did

from cdp-backend.

isaacna avatar isaacna commented on June 12, 2024

Yea those are likely real timeouts then. In which case we should add the timeout={some_int}. And btw we already have a retry=3 on the task.

Also I may just not be reading the fsspec docs right, but where is the timeout property that you're passing in fs.get actually defined in fsspec? I don't see it in HttpFileSystem, and it's parent class AsyncFileSystem is pretty confusing. It looks like the FSTimeout error gets thrown here but I'm not sure if the timeout param we're passing actually makes it to this method

from cdp-backend.

evamaxfield avatar evamaxfield commented on June 12, 2024

Yep. I have seen many less errors on the logs as well.

Also I may just not be reading the fsspec docs right, but where is the timeout property that you're passing in fs.get actually defined in fsspec? I don't see it in HttpFileSystem, and it's parent class AsyncFileSystem is pretty confusing. It looks like the FSTimeout error gets thrown here but I'm not sure if the timeout param we're passing actually makes it to this method

I know. I have been digging through the code as well and the AsyncFileSystem is hard to navigate. I was able to reproduce the timeout error on my local machine so I can try to debug it as well.

I will say, this issue and #120 combined are the primary reasons for the v3 pipeline to fail. More on #120 in it's issue comments.

from cdp-backend.

isaacna avatar isaacna commented on June 12, 2024

I know. I have been digging through the code as well and the AsyncFileSystem is hard to navigate. I was able to reproduce the timeout error on my local machine so I can try to debug it as well.

Sounds good, there's probably some simple way to increase the timeout but it isn't obvious based on the docs or code. It may be worth trying to clone fsspec and mess around with it directly

I will say, this issue and #120 combined are the primary reasons for the v3 pipeline to fail. More on #120 in it's issue comments.

Yeah I think fixing these should be a priority, but at least the issue #120 seems to have a fairly straightforward solution

from cdp-backend.

evamaxfield avatar evamaxfield commented on June 12, 2024

Sounds good, there's probably some simple way to increase the timeout but it isn't obvious based on the docs or code. It may be worth trying to clone fsspec and mess around with it directly.

If you find an event that is over 3 hours and like 10 minutes you can probably just give the fsspec + that weird HTTP options dict a try. Errrr really just find an event video that is really long and run the pipeline with the -f and -t options with dates that surround the video date imo.

from cdp-backend.

isaacna avatar isaacna commented on June 12, 2024

fsspec + that weird HTTP options dict a try

Which HTTP options are you referring to? Also if we have trouble timing out with fsspec we could consider using something else instead like requests or urllib

from cdp-backend.

evamaxfield avatar evamaxfield commented on June 12, 2024

From this comment: fsspec/filesystem_spec#619 (comment)

with fsspec.open('filecache::https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_land.zip',
                  https={'client_kwargs': {'timeout' :aiohttp.ClientTimeout(total=1)}}) as f:

I am wondering if we can pass that https={...} to the .get function.

from cdp-backend.

isaacna avatar isaacna commented on June 12, 2024

Looked into the issue and I figured out how to pass the client_kwargs to HttpFileSystem. In our case we'd have to pass it to url_to_fs so that it instantiates the aiohttp.ClientSession in HttpFileSystem with the kwargs. Will put out a PR sometime tomorrow

from cdp-backend.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.