Giter VIP home page Giter VIP logo

Comments (12)

cboettig avatar cboettig commented on August 26, 2024

@ryangarner Thanks for opening this issue, yeah, it would be nice if py_install would work out of the box. Note that reticulate is in fact already installed, as is pip3, but the python packages are installed system-wide using pip3 directly and not using virtualenv, which isn't much help for the user trying to install additional packages from R.

@noamross would love your thoughts on how best to go about this. In particular, as you know Debian/Ubuntu use separate namespaces for python (2.7) and python3, and I haven't figured out how to get the reticulate functions from R to use the python3 versions for everything. We have both python versions installed on the image (actually RStudio pulls in both versions now), so while we could do something like symlink ln -s /usr/bin/pip3/ /usr/local/bin/pip, ln -s /usr/bin/python3/ /usr/local/bin/python, I'm not sure that's a good idea, and I'm not entirely sure what to do to so that reticulate will find python3-virtualenv instead of the python-virtualenv after installing it.

@choldgraf could probably set me straight on the best way to go about the python virtualenv setup here.

from ml.

choldgraf avatar choldgraf commented on August 26, 2024

hmm - is the main question "how are environments set up with virtualenv in Python?" - e.g., is this a file paths problem?

from ml.

cboettig avatar cboettig commented on August 26, 2024

Thanks Chris, I guess this is really two questions:

Q1. What's the best way to set up a Python3 environment for Docker images?

As you know, ubuntu/debian distros expect users to explicitly request python3, calling just python, pip all mean Python 2. the default behavior of reticulate is to look for python and pip binaries, i.e. use python 2. Presumably this can be changed in reticulate config (e.g. use_python("/usr/bin/python3"), but I don't think that updates the paths for pip installs. Alternately we could go the symlink route. Note the official tensorflow Dockerfiles make this configurable in build args, but also symlink python3 to /usr/local/bin/python so that it works without the 3, though I'm not sure why they choose to do so.

Q2. What's the best choice for managing python environments in our context -- pip, virtualenv, or conda? (and how do we get those working in python3 instead of python2 on debian?)

reticulate is happy to use any of these options. Currently we're just going pure pip, but then users cannot install additional packages without root. I suspect we should set things up to use virtualenv, though this raises a series of additional questions: (a) how do you get reticulate to use python3 when creating a virtualenv mode? (b) What's the best choice of home path for the virtualenv (e.g. we would at least like the same python env to be available to root and non-root users), and (c) is virtualenv the best choice at all here? (e.g. Nick tells me we'd get better tensorflow performance using conda with intel MKL instead).

from ml.

choldgraf avatar choldgraf commented on August 26, 2024

sorry for the slow response - I'm actually not a super expert on python paths so may not be the best person to ask, but my undersatnding is:

The simplest for generic data science workflows that might not involve Python packages is to us miniconda to handle environments, along with the conda-forge channel for Anaconda. The other option is to use virtualenv and system python w/ pip...it's much more light-weight, though it can be non-trivial to install certain kinds of packages (e.g. mapping packages that require non-python dependencies like fiona). You might get some inspiration from the base repo2docker template here: https://github.com/jupyter/repo2docker/blob/master/repo2docker/buildpacks/base.py#L14

I don't believe that you must use pip with root privileges. Couldn't you install using the --user flag? That'd install to a user directory instead of root.

@yuvipanda might have some ideas for the best path forward here as well!

ps: for the MKL stuff, that might be the case...I've had differing results using MKL vs. BLAS for linear algebra stuff - I think it depends a lot on the specific computation you're running

from ml.

yuvipanda avatar yuvipanda commented on August 26, 2024

If you're already using system python, my recommendation is:

  1. early on in the dockerfile, create a virtualenv (as root) - python3 -m venv /opt/venv
  2. Change the ownership of /opt/venv to your regular user, so they can install packages into it without extra effort. chown -R rstudio:rstudio /opt/venv.
  3. Modify PATH to include the 'bin' directory inside the virtualenv. This will make python, pip etc default to using the python inside the virtualenv, and hence python3. ENV PATH=/opt/venv/bin:${PATH}
  4. Install whatever base packages you want into this virtualenv (as your normal user): python3 -m pip install --no-cache-dir <packages> or python3 -m pip install --no-cache-dir -r requirements.txt. The --no-cache-dir helps reduce the size of your docker image. Note that this must be done as your normal user - accidentally doing this as root will cause issues.

This should work for 99% of use cases. The big reason to move away from this is if you want to use a version of python different from what is provided by your system python. If you need to use a newer version of python, my recommendation is to use miniconda to get just python, but still use a virtualenv for everything else.

from ml.

ryangarner avatar ryangarner commented on August 26, 2024

This is my quick fix Dockerfile to get reticulate to work properly. Hope this helps!

FROM rocker/ml-gpu

RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get install curl -y
RUN curl -O https://bootstrap.pypa.io/get-pip.py
RUN python get-pip.py
RUN apt-get install python-virtualenv -y
RUN pip install virtualenv --upgrade

from ml.

cboettig avatar cboettig commented on August 26, 2024

@ryangarner thanks. Yup, doing apt-get install python-virtualenv will I believe install python2 version, as I've commented above. I think you could condense your version into

RUN apt-get update && apt-get -y install python-virtualenv python-pip

(note that in general you want to have apt-get update and apt-get install on the same line in Dockerfiles and avoid upgrade to play nicely caching).

If you wanted to stick with the python3 versions (Tensorflow plans to deprecate python 2 in the next year anyway) you'd do

RUN apt-get update && apt-get -y install python3-virtualenv python3-pip

but reticulate won't find pip or virtualenv then.

I quite like @yuvipanda 's proposed workflow above, so I'll give a stab at that. In particular, it sounds like step 3 will make python == python3? Yuvi, is there any risk of that messing up other things that are using python2?

from ml.

yuvipanda avatar yuvipanda commented on August 26, 2024

@cboettig I made #21

from ml.

yuvipanda avatar yuvipanda commented on August 26, 2024

@cboettig it shouldn't mess anything up, since it's only for things that run with the specific PATH set (so things started by the user in this container). This is also how mybinder.org runs (python refers to python3 there), so I think it's ok!

from ml.

cboettig avatar cboettig commented on August 26, 2024

@ryangarner if you use

reticulate::virtualenv_install("/opt/venv", "pandas") 

things should work as expected. you may want to set reticulate::use_virtualenv("/opt/venv")

Not sure what is up with py_install() since it should basically be calling use_virtualenv under the hood, but somehow it's error handler is checking and failing to find the virutalenv first. Still investigating...

from ml.

yuvipanda avatar yuvipanda commented on August 26, 2024

linking rstudio/reticulate#496 as related.

from ml.

cboettig avatar cboettig commented on August 26, 2024

thanks Yuvi! Digging a bit more this seems to be a problem in the reticulate source code inside py_install(), which assumes binaries are in ("/usr/bin", "/usr/local/bin", path.expand("~/.local/bin")) and not PATH. I've opened a separate issue here: rstudio/reticulate#499 (comment)

from ml.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.