Giter VIP home page Giter VIP logo

Comments (5)

gahobbsau avatar gahobbsau commented on June 18, 2024 2

This issue refers the cell is at 2.2.3 (c) Apply triple-barrier method.
I found the fix and the reason that running this cell hangs on my windows set-up which is similar to the specs listed above.

  1. First, the simple fix: At line cpus = cpu_count() - 1, change to cpus=1.
  • This will invoke single-thread execution for debugging [20.8] at def processJobs_(jobs)
  • On my laptop this cell runs in a second or two in single-thread execution
  • No hang. Problems solved.
  1. Reasons for the hang and broader fix are at a couple of levels (based on my setup)
    a) cpus = cpu_count() - 1 may not return a value. It didn't when I was earlier running the Cell. (A ggl search will show that the behaviour of multiprocessing.cpu_count() is not reliable across a range of hardware & OS combos. )
    b) getEvents calls up mpPandasObj for multi-processing and in mpPandasObj, the default value (if it gets called) for numThreads=24, which exceeds the specs above (CPU cores: 8).
  • In this Notebook, at def mpPandasObj, numThreads could be changed to = 1.
  • You will see in def mpPandasObj that "if numThreads==1:out=processJobs_(jobs)", it calls up processJobs_ in the following cell 20.8 for single-thread processing.

c) I have returned to another session after a computer restrart and found that cpus = cpu_count() - 1 does return the correct value of 7 cpus, However, now instead of hanging, a series of Errors are displayed in the Jupyter session console which each mention a module in the environment folder for the multiprocessing modules (Anaconda3\lib\multiprocessing\ in my current case.)

  • This would indicate to me that the multiprocessing code from ch 20 that is included in this Notebook cannot be presumed to be matched to computer specs such as those in the original post above.
  • This seems to be confirmed in at 20.5, page 309 of the book where it states "In this section, we will study one such engine, and once you understand the logic, you will be ready to develop your own, including all sorts of customized
    properties."
  • In the meantime, for me, rather than stumble around with the code in ch 20, I will just set the code to run in single-thread mode.
  1. Debugging
    a) I was able to detect this issue without too much trouble by debugging the getEvents()by running the def code one line at a time with the variable values in the state they have reached at 2.2.3(c).
    b) I used the set-up that I have described in the Suggestions to my other issue post at "Bars notebook - possible corrections", particularly the QtConsole and the Variable Inspector, and saving data variables to CSV files for inspection as to what is going on.

I trust that this may assist BlackArbsCEO and others who what to run the code in the Labelling Notebook.
And to BlackArbsCEO, I again Thank you greatly for the sharing your implementations in the 2 notebooks. It has assisted greatly in understanding and contributes towards the possibility of applying the work from the book.

from adv_fin_ml_exercises.

BlackArbsCEO avatar BlackArbsCEO commented on June 18, 2024

sorry I don't use Windows because of issues like you're experiencing in addition to other headaches I have experienced in the past. I recommend learning how to use Ubuntu. It is much easier to diagnose and fix issues in my opinion.

from adv_fin_ml_exercises.

aldebaransearch avatar aldebaransearch commented on June 18, 2024

It definitely also seems like problems with the multi processing functions in chapter 20. I have actually seen errors like this on both Windows and Linux, using the library one can patch together from the code snippets in the book. Some tasks run pretty fine and gives a way better utilizaition of your machine than if only running single process. Others go into some kind of infinite loop, spawning more and more processes. Maybe check how many processes you have running, @Chetanbuye12.

I crashed a 72 core Linux server using the exact code, giving the mpPandasObj function numThreads=20, but just before crashing >500 python processes were running.

If some one has an idea what goes on in the and how the code in chapter 20 should be changed to fit different architectures, I would be very interested in hearing possible causes

from adv_fin_ml_exercises.

gahobbsau avatar gahobbsau commented on June 18, 2024

I looked into this further after continuing to get Errors with cpu = a value other other than 1
ERROR Traceback:
Process SpawnPoolWorker-3:
. . .
AttributeError: Can't get attribute 'expandCall' on <module 'main' (built-in)>

The posts below indicate that it arises because the Jupyter in Windows is in interactive mode and python multiprocessing does not work in windows in interactive mode.

One solution posed is to execute the multiprocessing by putting it into a script.py and calling the script up from the Notebook cell. I haven't implemented this "solution" / workaround. I have - as I wrote above - simply set "cpu = 1" which invokes single thread processing, which executes quite quickly enough with the dataset used in this exercise.

References:
ipython/ipython#10894
https://stackoverflow.com/questions/48593694/python-multiprocessing-returning-attributeerror-when-following-documentation-cod
https://stackoverflow.com/questions/45719956/python-multiprocessing-attributeerror-cant-get-attribute-abc

Testing environment for multiprocessing
There are scripts at the post below which can be run in your environment to test that multiprocessing is working.
I simply saved the script in the answer post as py file, opened it in VS Code, and F5 Run in Debug mode, though I changed the range(300) to 30 so shorten the exercise.
https://stackoverflow.com/questions/48660656/multiprocessing-python-3-6-on-windows-10-not-working

from adv_fin_ml_exercises.

aldebaransearch avatar aldebaransearch commented on June 18, 2024

@gahobbsau, the issues I have reported above, actually appears in scripts (.py files) with all code after imports put inside the 'main' block and the multi processing code put in a separate .py file. Putting the executing code in the 'main' block is needed to make the multi processing run in a script on a Windows box. I assume that is related to the workaround you mention above for notebooks.

Hence, for me, there is still something more subtle to understand when it comes to the multi processing library that can be put together by the snippets from chapter 20.

from adv_fin_ml_exercises.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.