Giter VIP home page Giter VIP logo

Comments (9)

thieman avatar thieman commented on June 19, 2024

I think there is currently some very crude truncation going on in an attempt to keep the logs for each individual Task pretty small. These could still pile up if you have a bunch of tasks.

I'm in favor of adding configuration around sending logs and, if this is a common problem, configuring a max size on emails. Both should be their own issues.

from dagobah.

rclough avatar rclough commented on June 19, 2024

Yeah this was another feature I was thinking of. I know our processes will probably have a lot of STDOUT, so sending it directly in the email, while convenient, can be a bit much.

With utkarsh's logs code, you could have an email template that links to the logs in the web UI from the email, instead of putting them directly in the email.

from dagobah.

thieman avatar thieman commented on June 19, 2024

With utkarsh's logs code, you could have an email template that links to the logs in the web UI from the email, instead of putting them directly in the email.

I love this idea. Would reduce the need for both of those email config vars, I would think.

from dagobah.

surbas avatar surbas commented on June 19, 2024

Having a link would be cool, but you still have the problem of people who want it in the email.
To fix this issue, if we get an SMTPDataError, send the email again with the log replaced with the sentence "The log was too big to be sent by your email server".

If you are cool with that I will work on that and summit a pull request.

Edit: English

from dagobah.

rclough avatar rclough commented on June 19, 2024

Yes, in either case there should be detection for the email being too large, so +1 @surbas

from dagobah.

rclough avatar rclough commented on June 19, 2024

I think we might be hitting this issue with one of our jobs, so this is on our radar. (we have a long running job every morning that produces a lot of output that doesn't end up having an email sent.) Although looking through the dagobah log, we arent getting these errors, we're getting something like:

192.168.144.165 - - [14/Nov/2014 10:48:47] "GET /job/540e27e27eb4da4753d3bba4/Run%20wh%20job HTTP/1.1" 200 -
192.168.144.165 - - [14/Nov/2014 10:48:47] "GET /static/css/task_detail.css HTTP/1.1" 304 -
192.168.144.165 - - [14/Nov/2014 10:48:47] "GET /static/js/task_detail.js HTTP/1.1" 304 -
192.168.144.165 - - [14/Nov/2014 10:48:47] "GET /static/lib/Kickstrap1.3.2/Kickstrap/js/kickstrap.min.js HTTP/1.1" 404 -
192.168.144.165 - - [14/Nov/2014 10:48:48] "GET /api/logs?job_name=wh-log-transfer&task_name=Run+wh+job HTTP/1.1" 200 -
database error: Runner error: Overflow sort stage buffered data usage of 33758790 bytes exceeds internal limit of 33554432 bytes
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/dagobah/daemon/util.py", line 47, in wrapper
    result = fn(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/dagobah/daemon/api.py", line 122, in tail_task
    return task.tail(**call_args)
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 909, in tail
    self.name)
  File "/usr/lib/python2.6/site-packages/dagobah/backend/mongo.py", line 139, in get_latest_run_log
    for rec in cur:
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 1038, in next
    if len(self.__data) or self._refresh():
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 982, in _refresh
    self.__uuid_subtype))
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 925, in __send_message
    self.__compile_re)
  File "/usr/lib64/python2.6/site-packages/pymongo/helpers.py", line 109, in _unpack_response
    error_object)
OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33758790 bytes exceeds internal limit of 33554432 bytes
Exception on /api/tail [GET]
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/flask/app.py", line 1687, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/lib/python2.6/site-packages/flask/app.py", line 1360, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/lib/python2.6/site-packages/flask/app.py", line 1358, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/lib/python2.6/site-packages/flask/app.py", line 1344, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/lib/python2.6/site-packages/flask_login.py", line 663, in decorated_view
    return func(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/dagobah/daemon/util.py", line 54, in wrapper
    raise e
OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33758790 bytes exceeds internal limit of 33554432 bytes

I see a bunch a few of these OperationFailure errors, here's another:

Exception in thread Thread-3769:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.6/threading.py", line 736, in run
    self.function(*self.args, **self.kwargs)
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 824, in check_complete
    complete_time=datetime.utcnow())
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 1027, in _task_complete
    self.parent_job._complete_task(self.name, **kwargs)
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 547, in _complete_task
    self._on_completion()
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 592, in _on_completion
    self._serialize(include_run_logs=True))
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 639, in _serialize
    for task in self.tasks.itervalues()]
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 1044, in _serialize
    self.name)
  File "/usr/lib/python2.6/site-packages/dagobah/backend/mongo.py", line 139, in get_latest_run_log
    for rec in cur:
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 1038, in next
    if len(self.__data) or self._refresh():
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 982, in _refresh
    self.__uuid_subtype))
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 925, in __send_message
    self.__compile_re)
  File "/usr/lib64/python2.6/site-packages/pymongo/helpers.py", line 109, in _unpack_response
    error_object)
OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33758790 bytes exceeds internal limit of 33554432 bytes

from dagobah.

thieman avatar thieman commented on June 19, 2024

The exact issue here is probably a Mongo server-side bug, but we do need a better way in general for handling giant logs.

https://www.google.com/search?q=mongo+overflow+sort+stage

from dagobah.

thieman avatar thieman commented on June 19, 2024

@rclough Do you think it makes more sense to just drop logs over a certain size and warn the user about it or to implement something that could actually handle giant logs? We could try using GridFS in Mongo but as far as I know SQLite is going to be constrained by whatever maximum size we set for that column. No idea how large we can go, but it seems probably inefficient.

from dagobah.

rclough avatar rclough commented on June 19, 2024

I feel like if you are going to be running jobs with huge logs, you are probably going to want a backend more robust than SQLite. That said, being able to get full logs is pretty important, I think. A short fix would be dropping large logs with a warning, but often times in my experience, there's not much option to fix it (ie would be sacrificing necessariy info for job failures).

I don't know how GridFS works but quickly glancing it seems like a cool idea. It might be handy to have emails send a dagobah link to the log if the log is too big.

from dagobah.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.