Giter VIP home page Giter VIP logo

Comments (7)

tg2k avatar tg2k commented on August 15, 2024 1

Would have replied sooner but was traveling. This turned out to be tricky, because I was running an anaconda install (installed via Chocolatey). Attempting to upgrade Scikit-Learn landed me in this problem.

After checking this article I went with a miniconda install (also via Chocolatey) and used a .yml file similar to that article.

For anyone not experienced with miniconda who may be reading this, the basic steps were

choco install miniconda3

Activate miniconda base env:

C:\tools\miniconda3\Scripts\activate.bat

Update the conda install further:

conda update -n base -c defaults conda

Then install yml file with conda env create -f [path/to/yml_file.yml]

Within the .yml file, my dependencies so far look about like this:

dependencies:
  # https://github.com/ageron/handson-ml3 recommends Python 3.10, though anaconda3 install had 3.9
  #- python=3.9
  - python=3.10
  - flask
  - numpy
  - pandas
  - requests
  - pip
  - ipykernel
  - matplotlib
  - scikit-learn
  - tensorflow
  - keras
  - pip:
    - urlextract

Unlike the medium article, I opted for conda packages wherever possible, and only for pip where there was no conda package.

I saw a note during an operation that recommended I install scikit-learn-intelex (see here), but when I tried to patch that in, it caused errors, so I bailed on that option. I'm unclear on whether this accelerator has been maintained.

With this done, I have a more current install, and I would discourage anyone from using anaconda as in the medium article above.

from handson-ml3.

tg2k avatar tg2k commented on August 15, 2024 1

Actually, as I've progressed through the book, it became more and more difficult to keep a working Windows environment. Conflicts grew worse, I installed Mamba instead of Conda, pulled more from conda-forge, etc., but at a certain point in Chapter 11 or so it became untenable even for Mamba to find a solution in Windows. I think some of the Conda packages aren't available for critical versions in Windows, leading to a situation where a consistent environment was difficult or impossible to attain. For all I know this could change, but the experience has left me with the distinct impression that the data science community sees Windows as an afterthought.

I also kept running into conflicts with multiple libiomp5md.dll. I fixed one of them by forcing a newer version of numpy (maybe during Chapter 10) but it hit me again and so I looked at why I had the versions of libraries I did. TensorFlow 2.10 was one of the packages that used this DLL. I noted that 2.11 is out and as I began looking at that, I discovered that the TensorFlow team dropped native Windows GPU support, in favor of using WSL2.

This is annoying for Windows users, in that if you want the best TensorFlow experience you are forced to use a bit of Linux on Windows, but WSL2 is better integrated than any VM solution I've ever used. And it is far, far easier to get all the package versions. I'm unclear on whether packaging issues themselves are related to the TensorFlow team's decision. For those interested, 2.11 release notes here and non-explanations from the team are here and here.

WSL2-based Install Process

The overall install process becomes significantly longer but at least it's workable. For me it was roughly:

Install WSL 2 with
wsl --install

Open WSL by running "Ubuntu" app or going to a command line and just running wsl

If you have Windows folders for relevant scripts, use ln -s to link them for ease of use. Make sure those scripts have Unix line endings.

Install CUDA per this and this.
There are warnings about the possibility of overwriting the CUDA .so files so you may choose to back them up with something like cp -a -f /usr/lib/wsl/lib/libcuda.so* ~/libcuda/

The prior CUDA install has a problem that causes ldconfig-related warnings and errors and can crash certain TensorFlow code. If you look into C:\Windows\System32\lxss\lib the libcuda.so and libcuda.so.1 files will be regular files when they should be links. This directory apparently gets mapped into WSL's /usr/lib/wsl/lib folder upon WSL bootup. To fix it, in Windows open an Administrator cmd command line and run:

C:
cd \Windows\System32\lxss\lib
del libcuda.so
del libcuda.so.1
mklink libcuda.so libcuda.so.1.1
mklink libcuda.so.1 libcuda.so.1.1

Then close anything running inside WSL and restart WSL (you can use the same cmd instance):

wsl --shutdown
wsl

Inside WSL, verify libcuda now has symlinks with ls -l /usr/lib/wsl/lib/libcuda*.

Install graphviz with

sudo apt-get install graphviz graphviz-dev
pip install pygraphviz

Install mamba with

wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh

Run a new shell (this will activate the base environment too)

Add the Conda lib path to the end of ~/.bashrc (after the conda initialization block):

if command -v conda &>/dev/null; then
  # if conda is initialized, set LD_LIBRARY_PATH
  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
fi

Create mamba/conda env (in a section below I've posted some yml)
mamba env create -f [yml file]

Register Jupyter kernel
python3 -m ipykernel install --user --name=[your conda env]

Verify TensorFlow
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Per this if it's an "I" prefixed message, even though it may look like a warning about NUMA mode, CUDA cores should be working for TensorFlow.

Another verification of CUDA per NVidia:
/usr/local/cuda/bin/nvcc -V

TensorFlow can print a lot of warnings about NUMA support, so edit .bashrc and around another export add
export TF_CPP_MIN_LOG_LEVEL=2
The alternative to this is to add this in Jupyter notebooks:

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

Install protoc
sudo apt install -y protobuf-compiler

Restart VSCode if it is running. Install the Python extension and the WSL extension (formerly Remote-WSL). In the bottom left of the VSCode window connect to the WSL instance, which will re-launch VSCode to connect to WSL.
Connected to WSL, install the Python extension again (for the remote instance), and the Jupyter extension (again for the remote instance). For any readers wondering how to do this, just pull up the extension in VSCode and there are now extra options in its listing to manage it remotely.

YML

My YML looks roughly like so:

name: Your_WSL_ML

channels:
  - conda-forge
  - defaults
  
dependencies:
  - python=3.10
  - flask
  - numpy>=1.24.0
  - pandas
  - requests
  - pip
  - ipykernel
  - ipywidgets
  - tqdm
  - matplotlib
  - scikit-learn
  - conda-forge::cudatoolkit>=11.2
  - conda-forge::cudnn>=8.1.0
  - tensorflow>=2.11.0
  - keras>=2.11.0
  - python-graphviz
  - pydot>=1.4.2
  - conda-forge::keras-tuner
  - conda-forge::tensorflow-hub
  - conda-forge::tensorflow-datasets
  - statsmodels
  - pip:
    - urlextract
    - tensorflow-addons

Performance with WSL2

I have a new and reasonably powerful desktop computer, and prior to WSL I was able to run the heavier (generally Scikit-Learn) loads in 1/3 - 1/2 of the times that the book warned about. However with WSL these loads are generally running slower than the book mentions, so I think my throughput has been significantly cut down. Theoretically on very large loads the GPU support should make up for the WSL overhead. Moreover at least I shouldn't have to deal with as much head-banging to resolve package installation issues. Fingers crossed on that as I continue through the book.

Make sure though to put all files for the book inside the WSL Ubuntu directory structure. If instead you do as I initially did, and run it off /mnt/c (or a symlink thereof), you'll encounter significant slowdowns as I mention in a comment below.

TensorFlow 2.11 Compatibility with book's Jupyter notebooks

Installing TensorFlow 2.11 revealed some incompatibilities with the current Jupyter notebook code vs. TensorFlow.

In Chapter 11 you can use this line:

optimizer = tf.keras.optimizers.legacy.SGD(learning_rate=0.01, decay=1e-4)

The current optimizer requires decay_steps and decay_rate instead of decay, and although I tried to get identical results, in my own attempts I couldn't match the results of the legacy SGD. I'm guessing it can be done. My attempt at it looked like this:

sgd_lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=0.01, decay_steps=10000,
    decay_rate=1e-4)
sgd_new_optimizer = tf.keras.optimizers.SGD(learning_rate=sgd_lr_schedule)

In Chapter 12 you can use this line:

class MyMomentumOptimizer(tf.keras.optimizers.legacy.Optimizer):

The .legacy avoids an issue with a missing _set_hyper() method. Or new code could be provided for the current optimizer.

In Chapter 14

optimizer = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9)

can be turned into a .legacy in the definition immediately before the Classification and Localization section.

from handson-ml3.

tg2k avatar tg2k commented on August 15, 2024 1

@ageron I have no doubt this worked better on Windows some time ago. Some combination of rapidly advancing packages, lack of attention/support for Windows at coding and packaging levels, and TF 2.11's lack of Windows GPU support all conspired over time to ruin the pure-Windows experience.

It's unfortunate because even if TF still had full Windows support I could still run into problems with other packages, yet running on WSL comes with significant performance penalties. With pure Windows my new computer easily beat out your years-old laptop stats by a factor of 2-3x, but under WSL it's often 2-4x slower than any numbers you included.

It gets even worse if the files are on the Windows side rather than in the Ubuntu filesystem. Chapter 13's Exercise 10 IMDB tree printout is quick with all the files on WSL, but if the files were actually under /mnt/c (aka marshalled over Windows using 9P which should eventually get somewhat faster) then it became a multi-minute operation.

None of this is intended to discourage Windows users to steer clear, I just hope that some of the info above (which I've been editing as I continue through the book) helps other users and you may want to incorporate some of it into your own instructions.

from handson-ml3.

ageron avatar ageron commented on August 15, 2024

Thanks for your feedback @tg2k .
This warning is displayed because Scikit-Learn 1.0.2 called scipy.stats.mode() without setting the keepdims argument, which is fine for now (so you can ignore the warning), but it will break when SciPy reaches version 1.11 (it's 1.10 now).
Luckily, the Scikit-Learn team knows about this issue, and they've fixed it in version 1.1.2 (see PR scikit-learn/scikit-learn#23633).

If you don't want to see this warning, you can upgrade Scikit-Learn to 1.1.2 or later.

Hope this helps.

from handson-ml3.

ageron avatar ageron commented on August 15, 2024

This is great feedback, and I'm sure it will be useful to other readers, thanks a lot @tg2k ! 👍

from handson-ml3.

ageron avatar ageron commented on August 15, 2024

This is gold @tg2k , thanks so much for taking the time to write this thorough review of your ML experience on Windows. I agree with you that Windows does not seem to be a high priority for most of the ML community, sadly. In fact, I'm currently consulting for a company that's entirely running on Windows, and I keep running into issues like the ones you encountered, it's quite frustrating. That said, I did run all the notebooks in this project on Windows before the book came out (on a Windows Server VM on Google Cloud), but it looks like some things have broken since then. I'll investigate as soon as I can.

from handson-ml3.

tg2k avatar tg2k commented on August 15, 2024

An interesting thing came up when I went to upgrade to TensorFlow 2.12. Here the .bashrc change was probably wrong and the right way to get the LD_LIBRARY_PATH set was, per the TF install guide, to activate your conda env and then run

echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

See also this issue I created in the TF project.

For TensorFlow 2.12 newer packages are required elsewhere, some of which (like cudnn and the huggingface transformers/datasets) are currently only on pip. Also I ended up reapplying the Windows-side \Windows\System32\lxss\lib hacks from above. My env yml file for TF 2.12 looks like this:

name: YOUR_WSL_ML

channels:
  - conda-forge
  - defaults
  
dependencies:
  - python=3.11
  - flask
  - numpy>=1.24.0
  - pandas
  - requests
  - pip
  - ipykernel
  - ipywidgets
  - tqdm
  - matplotlib
  - scikit-learn
  - conda-forge::cudatoolkit>=11.8.0  
  - python-graphviz
  - pydot>=1.4.2
  - statsmodels
  - pip:
    - urlextract
    - nvidia-cudnn-cu11>=8.6.0.163
    - tensorflow>=2.12.0
    - keras>=2.12.0
    - keras-tuner
    - tensorflow-hub
    - tensorflow-datasets
    - tensorboard
    - transformers
    - datasets

from handson-ml3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.