Comments (23)
After looking into the code, I tested other backends instead of dbm, but the pickling errors remain. The only difference is the type which causes the problems, for JSON
and SQLite
it is a generator:
[... stripped duplicate lines ...]
File "C:\Anaconda\envs\surface-classification\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Anaconda\envs\surface-classification\lib\pickle.py", line 748, in save_global
(obj, module, name))
PicklingError: Can't pickle <type 'generator'>: it's not found as __builtin__.generator
from doit.
From https://docs.python.org/2/library/multiprocessing.html#windows
Ensure that all arguments to Process.init() are picklable. This means, in particular, that bound or unbound methods cannot be used directly as the target argument on Windows โ just define a function and use that instead.
It seems doit fails on that because the target
argument is a method MRunner.execute_task_subprocess
see https://github.com/pydoit/doit/blob/master/doit/runner.py#L388
You gonna need to refactor the code to make all arguments to Process
pickable, it might not be an easy task because almost all doit code will come under that... Sorry but I dont use Windows so dont expect help from me.
from doit.
I guess an easier path would be to implement a "runner" using futures [[1]](note there is a backport for python2 on pypi). Using futures
it is possible to delay the creation of processes to the point where the actual task will be executed. So you would need to worry about pickling much less stuff (and actually doit already takes care of that).
[1] https://docs.python.org/3/library/concurrent.futures.html
from doit.
Thank you for your feedback!
You gonna need to refactor the code to make all arguments to Process pickable, it might not be an easy task because almost all doit code will come under that... Sorry but I dont use Windows so dont expect help from me.
This seems to be more work than I can do at the moment, so I will stick with my current setup which uses waf (my project requires parallel processing to achieve bearable runtimes for parameter optimization).
I was very excited when I read about doit in this Software Carpentry lesson, as all other build tools I tried so far (SCons, waf) have some quirks when it comes to workflow automation. In addition, the doit approach of pure Python without another domain-specific language is very appealing.
It is a bit unfortunate that Windows support is often lacking, especially for a tool without binary dependencies, but I know that the workarounds for the missing fork functionality in Windows are problematic. I will leave this issue open to provide a starting point for other Windows users.
from doit.
It is a bit unfortunate that Windows support is often lacking
Yes it is unfortunate Windows users expect projects to work on Windows but are not willing to contribute themselves...
from doit.
Yes it is unfortunate Windows users expect projects to work on Windows but are not willing to contribute themselves...
Please don't get bitter, I have been working with the scientific Python stack for several years now and I am used to all sort of problems with binary extensions (GCC vs MSVC) on Windows, so a pure Python project (to me) promises better cross platform support.
I don't know what state data has to be distributed to the worker processes in doit and currently don't have the time (publication deadlines) to work through its codebase to understand its structure to get to the point of fixing this issue.
I don't expect you to fix all Windows-related issues, I have just reported one in the hopes that there might be an easy fix and to provide a starting point for someone willing to fix it (this might even be me in a couple of weeks). Documenting this issue also helps Windows users looking for parallel execution save some time (I searched the mailing list for Windows issues and found only satisfied users, maybe everybody uses Linux).
Keep up the good work. :-)
from doit.
It doesn't look like the whole thing is that far off being picklable. You can fix the dbm related problems by replacing:
self._set = self.backend.set
self._get = self.backend.get
self.remove = self.backend.remove
self.remove_all = self.backend.remove_all
self._in = self.backend.in_
with:
def _set(self,*args,**kwargs):
return self.backend.set(*args,**kwargs)
def _get(self,*args,**kwargs):
return self.backend.get(*args,**kwargs)
def remove(self,*args,**kwargs):
return self.backend.remove(*args,**kwargs)
def remove_all(self,*args,**kwargs):
return self.backend.remove_all(*args,**kwargs)
def _in(self,*args,**kwargs):
return self.backend.in_(*args,**kwargs)
Beyond that the following objects seem to be having trouble pickling:
<DB object at 0x0378EF20> of type <type 'DB'> - I assume this is something buried in the dbm module
<open file '<stdout>', mode 'w' at 0x0159D078> of type <type 'file'> - This almost certainly doesn't need to be pickled
<generator object _dispatcher_generator at 0x03741E68> of type <type 'generator'> - Generators can't be pickled ever. But, this seems to be a top level thing
Edit: Remove some errors that are a result of our application code.
Do the sub processes need access to the DB, output IO and the dispatch generator? It would be easy to write custom pickle functions that just ignores them.
from doit.
@gstorer how did you get this list of objects having trouble to pickle? Can you get more details of which objects contain them?
I guess the first step to fix things for windows would be to setup a CI for windows like appveyor.
doit already removes unpickable stuff from Task object because that is also required on linux...
I guess there is nothing else but a Task object really needed in a subprocess. It is just a matter of cleaning up the objects to make sure they dont contain unpickable stuff...
from doit.
๐ for appveyor. Should be really simple to configure after you log in with them and create the project there (you can use py3.4 for doit; we canโt because lxml
does not have a 3.4 wheel)
from doit.
I put the following code at the top of _run_start_processes (partially copied from http://stackoverflow.com/questions/6589869/how-to-find-source-of-error-in-python-pickle-on-massive-object)
class MyPickler (pickle.Pickler):
def save(self, obj):
print 'pickling object', obj, 'of type', type(obj)
try:
pickle.Pickler.save(self, obj)
except:
print 'error. skipping...'
import StringIO
pickler = MyPickler(StringIO.StringIO())
pickler.dump(self)
DB is from the class dict:
{'saved_dbc_key': None, 'db': <DB object at 0x035E4F20>, 'dbc': None, '_kill_iteration': False, '_in_iter': 0, '_cursor_refs': {}} of type <type 'dict'>
which looks like its class _DBWithCursor
from the python library http://svn.python.org/projects/python/trunk/Lib/bsddb/__init__.py
stdout is from the class dict:
{'runtime_errors': [], 'failures': [], 'show_out': True, 'outstream': <open file '<stdout>', mode 'w' at 0x0269D078>, 'show_err': True} of type <type 'dict'>
I think this is the reporter class but you'd know better than me.
_dispatcher_generator from the class dict:
{'tasks': {'startup': <Task: startup>, 'shutdown': <Task: shutdown>, 'align_profiles': <Task: align_profiles>}, 'generator': <generator object _dispatcher_generator at 0x0358BAD0>, 'waiting': set([]), 'ready': deque([]), 'nodes': {}, 'targets': {}} of type <type 'dict'>
which is most certainly TaskDispatcher
I can't really assist with with CI but I'm probably going to get this fixed one way or another and it would be nice to push the changes back to the project.
from doit.
I can't really assist with with CI but I'm probably going to get this fixed one way or another and it would be nice to push the changes back to the project.
Sure, patches are welcome. The CI I mentioned is a service, you dont need to setup anything yourself , just add a configuration file and it will automatically run the tests after every commit/PR on github.
from doit.
Alright, I think I've fixed it. Some of the tests fail on windows. One less fails post fix though :). I don't appear to have made things any worse.
There was an extra generator item failing the pickle. I had to put the logging into the Python pickle library directly in order to find it.
We also had to fix up some of our application's task functions to ensure they are picklable as well.
That CI looks neat but I've already spent to much time on this. We'll be using this library pretty heavily over the next month. So maybe I'll look into it but I wouldn't hold ones breath.
from doit.
@joschkazj can you help test the pull request?
from doit.
Sure, I will look into it next week.
from doit.
related: #67 (Support usage of lambda in tasks actions with multi-process runner)
from doit.
WIP in this branch: https://github.com/pydoit/doit/compare/test
Next step: why test get stuck on py27? https://ci.appveyor.com/project/schettino72/doit/build/job/gsqqqe94ih2hocms
from doit.
So after running the tests with in no-capture mode I could see the error message. The problem is a bug on py27 http://bugs.python.org/issue10845
I tried to implement a work around without success. No idea why this was never a problem to you guys...
So unless I get some help on this I will just disable multiprocessing when using python2.7 on Windows.
from doit.
You could ask for a 2.7 backport.
from doit.
You could ask for a 2.7 backport.
I guess python 2.7 can only receive security fixes, so this would not be accepted... And I don't really care enough to push this forward. I dont use python 2.7 anymore, and never used Windows to run python code :D
from doit.
I might have a bit of time to look at the tests this week. Did you end up resolving anything with anoymous code? e.g. cloudpickle?
from doit.
No I didnt deal with cloudpickle. thats in another ticket - and up for grabs :) I wont work on it anytime soon...
from doit.
Well, this took longer to track down then I would've liked (so no cloud pickle). The fix is here: #84. There was also another pickling issue which I'm sure you can fix in a more elegant way. Feel free to close the pull request.
from doit.
Update: It now works on python 2.7 too.
I also started using cloudpickle but Windows still doesnt make use of it when forking (or whatever Windows use). So there is still some room for improvements.
Please test again latest master and let me know.
from doit.
Related Issues (20)
- tasks must not change working directory to avoid "No such file or directory: '.doit.db.dat'" HOT 5
- Cleaning all doesn't seem to work HOT 2
- tasks are uptodate even though their task_dependency is not HOT 6
- -s param doesnยดt accept multiple arguments since 0.34.0 HOT 1
- Pre-commit hook HOT 2
- typo on opencollective page? HOT 2
- Interesting doit use HOT 1
- How to pass command line arguments to dependent tasks
- Problems with interaction between config files and --seek-file/--file HOT 2
- Feature Request: Version of LongRunning that returns TaskError on non zero return code? HOT 2
- dep_manager.get_result() should not be discouraged if MRunner is also used HOT 1
- Task marked as not up-to-date because of a uppercase/lowercase difference in the drive letter. HOT 1
- Building version `>= 0.36.0` conda package for windows with python 3.11 HOT 2
- create required CmdOption HOT 1
- basename in Task.valid_attr, but not in Task.__init__
- watch depedency has no effect on task selection
- Program hangs when num process is set to 1 for using multiprocessing
- Gradual update of doit.dat file for multi-tasks (subtasks)
- Doit ignores action_string_formatting config when defined in pyproject.toml
- How to get the pathpath / sys.path correct with doit when using a src directory
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from doit.