Comments (9)
I think there is currently some very crude truncation going on in an attempt to keep the logs for each individual Task pretty small. These could still pile up if you have a bunch of tasks.
I'm in favor of adding configuration around sending logs and, if this is a common problem, configuring a max size on emails. Both should be their own issues.
from dagobah.
Yeah this was another feature I was thinking of. I know our processes will probably have a lot of STDOUT, so sending it directly in the email, while convenient, can be a bit much.
With utkarsh's logs code, you could have an email template that links to the logs in the web UI from the email, instead of putting them directly in the email.
from dagobah.
With utkarsh's logs code, you could have an email template that links to the logs in the web UI from the email, instead of putting them directly in the email.
I love this idea. Would reduce the need for both of those email config vars, I would think.
from dagobah.
Having a link would be cool, but you still have the problem of people who want it in the email.
To fix this issue, if we get an SMTPDataError, send the email again with the log replaced with the sentence "The log was too big to be sent by your email server".
If you are cool with that I will work on that and summit a pull request.
Edit: English
from dagobah.
Yes, in either case there should be detection for the email being too large, so +1 @surbas
from dagobah.
I think we might be hitting this issue with one of our jobs, so this is on our radar. (we have a long running job every morning that produces a lot of output that doesn't end up having an email sent.) Although looking through the dagobah log, we arent getting these errors, we're getting something like:
192.168.144.165 - - [14/Nov/2014 10:48:47] "GET /job/540e27e27eb4da4753d3bba4/Run%20wh%20job HTTP/1.1" 200 -
192.168.144.165 - - [14/Nov/2014 10:48:47] "GET /static/css/task_detail.css HTTP/1.1" 304 -
192.168.144.165 - - [14/Nov/2014 10:48:47] "GET /static/js/task_detail.js HTTP/1.1" 304 -
192.168.144.165 - - [14/Nov/2014 10:48:47] "GET /static/lib/Kickstrap1.3.2/Kickstrap/js/kickstrap.min.js HTTP/1.1" 404 -
192.168.144.165 - - [14/Nov/2014 10:48:48] "GET /api/logs?job_name=wh-log-transfer&task_name=Run+wh+job HTTP/1.1" 200 -
database error: Runner error: Overflow sort stage buffered data usage of 33758790 bytes exceeds internal limit of 33554432 bytes
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/dagobah/daemon/util.py", line 47, in wrapper
result = fn(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/dagobah/daemon/api.py", line 122, in tail_task
return task.tail(**call_args)
File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 909, in tail
self.name)
File "/usr/lib/python2.6/site-packages/dagobah/backend/mongo.py", line 139, in get_latest_run_log
for rec in cur:
File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 1038, in next
if len(self.__data) or self._refresh():
File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 982, in _refresh
self.__uuid_subtype))
File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 925, in __send_message
self.__compile_re)
File "/usr/lib64/python2.6/site-packages/pymongo/helpers.py", line 109, in _unpack_response
error_object)
OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33758790 bytes exceeds internal limit of 33554432 bytes
Exception on /api/tail [GET]
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/flask/app.py", line 1687, in wsgi_app
response = self.full_dispatch_request()
File "/usr/lib/python2.6/site-packages/flask/app.py", line 1360, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/lib/python2.6/site-packages/flask/app.py", line 1358, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/lib/python2.6/site-packages/flask/app.py", line 1344, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/usr/lib/python2.6/site-packages/flask_login.py", line 663, in decorated_view
return func(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/dagobah/daemon/util.py", line 54, in wrapper
raise e
OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33758790 bytes exceeds internal limit of 33554432 bytes
I see a bunch a few of these OperationFailure errors, here's another:
Exception in thread Thread-3769:
Traceback (most recent call last):
File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.6/threading.py", line 736, in run
self.function(*self.args, **self.kwargs)
File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 824, in check_complete
complete_time=datetime.utcnow())
File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 1027, in _task_complete
self.parent_job._complete_task(self.name, **kwargs)
File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 547, in _complete_task
self._on_completion()
File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 592, in _on_completion
self._serialize(include_run_logs=True))
File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 639, in _serialize
for task in self.tasks.itervalues()]
File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 1044, in _serialize
self.name)
File "/usr/lib/python2.6/site-packages/dagobah/backend/mongo.py", line 139, in get_latest_run_log
for rec in cur:
File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 1038, in next
if len(self.__data) or self._refresh():
File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 982, in _refresh
self.__uuid_subtype))
File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 925, in __send_message
self.__compile_re)
File "/usr/lib64/python2.6/site-packages/pymongo/helpers.py", line 109, in _unpack_response
error_object)
OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33758790 bytes exceeds internal limit of 33554432 bytes
from dagobah.
The exact issue here is probably a Mongo server-side bug, but we do need a better way in general for handling giant logs.
https://www.google.com/search?q=mongo+overflow+sort+stage
from dagobah.
@rclough Do you think it makes more sense to just drop logs over a certain size and warn the user about it or to implement something that could actually handle giant logs? We could try using GridFS in Mongo but as far as I know SQLite is going to be constrained by whatever maximum size we set for that column. No idea how large we can go, but it seems probably inefficient.
from dagobah.
I feel like if you are going to be running jobs with huge logs, you are probably going to want a backend more robust than SQLite. That said, being able to get full logs is pretty important, I think. A short fix would be dropping large logs with a warning, but often times in my experience, there's not much option to fix it (ie would be sacrificing necessariy info for job failures).
I don't know how GridFS works but quickly glancing it seems like a cool idea. It might be handy to have emails send a dagobah link to the log if the log is too big.
from dagobah.
Related Issues (20)
- error when start dagobah
- Installation issue, missing gpost HOT 1
- dagobahd not working on Mac OS X HOT 1
- get_status method for job HOT 2
- Changing code to use py-dag as a package instead of the internal file HOT 2
- RuntimeError: can't start new thread
- How to run and debug the whole project? HOT 1
- Is it distributed?
- Timezone cron / scheduler
- Updating config file, changes not reflected HOT 6
- Run Dagobah from folder HOT 1
- Address jump HOT 6
- [CLOSED] reformat code
- [CLOSED] Reformat code
- [CLOSED] Reformat code
- [CLOSED] update setup requirements
- [CLOSED] add amazon ses service in emails
- [CLOSED] Close SSH connection for remote tasks
- [CLOSED] UnicodeEncodeError while sending basic email
- Fail to install dagobah HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dagobah.