Giter VIP home page Giter VIP logo

Comments (6)

tw4l avatar tw4l commented on June 21, 2024 1

@Chickensoupwithrice In your court now if you want to try to figure this out :)

from browsertrix.

Chickensoupwithrice avatar Chickensoupwithrice commented on June 21, 2024 1

Alright, after much experimenting I've managed to nail down exactly where we're no longer doing generators and instead load up all the logs. It's in the way we're calling stream_log_bytes_as_line_dicts (which by itself does return a generator) but instead we're extending an array by the output of the generator leading to loading the entire log file into memory and then run out of memory.

Switching extend to append does mean we're generators all the way down, but then when I try to execute on this generator, I get read timeouts on the DO space?

image

Still investigating.

from browsertrix.

tw4l avatar tw4l commented on June 21, 2024

First pass is implemented in #682.

We'll want to move to properly streaming logs, currently blocked by aio-libs/aiobotocore#991

from browsertrix.

ikreymer avatar ikreymer commented on June 21, 2024

Until the aiobotocore is resolved, we may be able to use the sync download option, since we've already implemented this to support collection downloads via https://github.com/webrecorder/browsertrix-cloud/blob/main/backend/btrixcloud/storages.py#L358

from browsertrix.

tw4l avatar tw4l commented on June 21, 2024

Implemented as a sync stream in #1168 . Closing for now, though we may eventually want to make this async.

from browsertrix.

tw4l avatar tw4l commented on June 21, 2024

Still seems to be a memory issue, looking into it.

Could just fetch from presigned URLs

from browsertrix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.