Comments (6)
@Chickensoupwithrice In your court now if you want to try to figure this out :)
from browsertrix.
Alright, after much experimenting I've managed to nail down exactly where we're no longer doing generators and instead load up all the logs. It's in the way we're calling stream_log_bytes_as_line_dicts
(which by itself does return a generator) but instead we're extend
ing an array by the output of the generator leading to loading the entire log file into memory and then run out of memory.
Switching extend
to append
does mean we're generators all the way down, but then when I try to execute on this generator, I get read timeouts on the DO space?
Still investigating.
from browsertrix.
First pass is implemented in #682.
We'll want to move to properly streaming logs, currently blocked by aio-libs/aiobotocore#991
from browsertrix.
Until the aiobotocore is resolved, we may be able to use the sync download option, since we've already implemented this to support collection downloads via https://github.com/webrecorder/browsertrix-cloud/blob/main/backend/btrixcloud/storages.py#L358
from browsertrix.
Implemented as a sync stream in #1168 . Closing for now, though we may eventually want to make this async.
from browsertrix.
Still seems to be a memory issue, looking into it.
Could just fetch from presigned URLs
from browsertrix.
Related Issues (20)
- [Feature]: search collection items by tags
- [Feature]: Improve UX of prefix search or switch to fulltext search HOT 2
- [Feature]: QA should include certain workflow settings
- [Feature]: Org Billing Page
- [Change]: Graph non-HTML page QA results as a discrete bar HOT 2
- Use first seed for workflows with no name in browser profile detail workflows list
- Shoelace progress rings always display at 100% completion in Chrome HOT 1
- Shoelace button groups don't appear correctly HOT 2
- [Bug]: Profiles are cut off at the bottom HOT 1
- [Bug]: Profile VNC connection fails while profile browser is still running (was: Profile ping returning success after expired) HOT 1
- [Feature]: Allow setting scale for QA runs in helm chart
- [Bug]: The copy-field label is inside the field
- [Feature]: Show and update the QA results bar graph while analysis is running
- [Change]: Update column sorting for all tables HOT 1
- Indicate pages with significant failures/unable to be analyzed separately from "No data" in QA meter HOT 1
- Use rounded border radius on QA meter bars
- [Bug]: QA analysis fails all the time for "pol frontpage with all context"
- Add button to QA crawl in Watch Crawl tab when crawl completes
- QA: Show number of files and errored pages separately from QA meter HOT 1
- [Bug]: Ensure the qa configmap updated for long running QA runs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from browsertrix.