spietras / rules_conda Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 11.0 25.58 MB

Rules for creating conda environments in Bazel :green_heart:

Home Page: https://spietras.github.io/rules_conda

License: MIT No Attribution

Starlark 76.59% Python 2.83% Shell 9.70% Batchfile 10.89%

bazel conda python

rules_conda's People

Contributors

Stargazers

Watchers

Forkers

jiawen gabrieldougherty yibum field33 anyscale ciuncan sfc-gh-zpeng benbrittain sandeepgadhwal tqtensor leriel

rules_conda's Issues

Add tests

It would be nice to have some test automation to be able to see if new changes break something

README should be updated with `mamba` flags

The README still uses the old flags before mamba support was added.

Add GitHub Action for linting

It would be nice to have a GitHub Action set up for code autoformatting. Maybe we could use the existing Buildifier GitHub Action: https://github.com/thompsonja/bazel-buildifier

Recreating environments from scratch

Hey, any update on that?

Tutorial how to build a docker image with rules_conda and bazel

Hi,

thank you for providing this nice rules! I am a newbie with bazel and would like to ask for help building a py3_image instead of a py_binary.

Thanks you!

Mamba

Think about using mamba instead of conda. That might speed things up.

Set `clean` flag to `False` by default

Could improve user experience (e.g. faster consecutive recreations) for less aware users

Update dependencies

bazelisk binaries
bazel version
conda version
rules_python version in the example

Not supporting Python 2 might be even considered a good practice nowadays as the world rather wants to move on. I don't use it and don't know anyone that does, so for me it makes no difference. It might even make things easier to have an assumption that only Python 3 will be used.

I think we should wait some time and if no one protests then we'll drop it.

rules_conda runs during analysis phase

I know that this might be a rather large request, but is there any chance to change rules_conda to not run it's heavy work (installing conda and setting up the environment) during analysis phase?

As far as I understand it basically all other rulesets only set up a scaffold of rules during the analysis phase, which makes them fast to run, and only do the heavy lifting if one of its rules is involved in the action graph of a target. What rules_conda is doing right now may be fine for a very python-focused bazel monorepo, but in one that has all kinds toolchains in it (which is the case for ours), simply including rules_conda increases e.g. the CI time of running lints for the frontend parts of the repo from ~1min to ~6min, because the conda environment is being set up, even though it will never be used.

Would love to hear your thoughts on this!

Expose Python interpreter in a platform-independent way

Context

I'm trying to use rules_conda with pybind11_bazel in Hermetic Python mode. This is necessary because pybind11 requires extension modules to be built for the correct Python version.

pybind11_bazel supplies a python_configure rule that takes an attr python_interpreter_target that's a of type Label:

python_configure(
  name = "local_config_python",
  python_interpreter_target = "@python_interpreter//:python_bin",
)

Problem

There does not seem to be a way to pass the Python interpreter created by rules_conda's environment to python_configure.

My configuration

In WORKSPACE:

load("@rules_conda//:defs.bzl", "conda_create", "load_conda", "register_toolchain")

load_conda(
    quiet = False,  # use True to hide conda output
    version = "4.10.3",  # optional, defaults to 4.10.3
)

conda_create(
    name = "conda_test_env",
    timeout = 600,  # each execute action can take up to 600 seconds
    clean = False,  # use True if you want to clean conda cache (less space taken, but slower subsequent builds)
    environment = "@//:conda_test_env.yml",  # label pointing to environment.yml file
    quiet = False,  # use True to hide conda output
)

register_toolchain(
    py3_env = "conda_test_env",
)

Which generates a external/conda_test_env/BUILD file containing:

py_runtime(
    name = "python_runtime",
    files = glob(
        ["conda_test_env/**/*"],
        exclude_directories = 0,
    ),
    interpreter = "conda_test_env/bin/python",
    python_version = "PY3",
)

On the file system, the Python interpreter is located at: $(bazel info output_base)/external/conda_test_env/conda_test_env/bin/python.

Things I tried:

In WORKSPACE:

python_configure(
    name = "local_config_python",
    python_interpreter_target = "@conda_test_env//conda_test_env/bin/python",
)

This doesn't work because it complains that bin is not a package: it requires a BUILD file which makes sense. Changing .../bin/python to /.../bin:python returns a similar error.

Manually adding a BUILD file to the bin directory now causes a problem for py_runtime, which contains the line: interpreter = "conda_test_env/bin/python"

ERROR: /private/var/tmp/_bazel_jiawen/8678712aa06452e8f0efd934ed354368/external/conda_test_env/BUILD:6:11: Label '@conda_test_env//:conda_test_env/bin/python' is invalid because '@conda_test_env//conda_test_env/bin' is a subpackage; perhaps you meant to put the colon here: '@conda_test_env//conda_test_env/bin:python'?

But adding the colon also fails, because apparently it must be a target name and not a label?

ERROR: /private/var/tmp/_bazel_jiawen/8678712aa06452e8f0efd934ed354368/external/conda_test_env/BUILD:6:11: @conda_test_env//:python_runtime: invalid label 'conda_test_env/bin:python' in attribute 'interpreter' in 'py_runtime' rule: invalid target name 'conda_test_env/bin:python': target names may not contain ':'

Any ideas?

Test framework

Now we have only one test: it's calling the rules with standard arguments and checks if all packages were installed correctly.

But as we grow, new arguments are added to the rules and possible input configurations are multiplying. We want to test if execution is successful depending on different inputs to the rules, not only with one standard input.

As these are repository rules, they are used inside a WORKSPACE file. This means that each different input configuration requires a different workspace.

It would be best to just have some configuration matrix with different input values. The test framework would make a workspace for each combination of values, check the execution there, and report which ones fail. The test can be the same as it is now - checking if installed package versions are correct.

Example for using different environment files for different platforms

I'm going down a Bazel/conda monorepo rabbithole. I did a bunch of reading and it seems the existing rules, by passing kwargs should be able to support the use case where we have different platforms requiring different environment files (e.g., GPU support on Linux but not on Mac).

I think this is possible with a combination of:

Bazel platforms
Passing --platforms=... as Configurable attributes.
and Toolchains passing platform constraints to register_toolchains.

Shall I set up a simple example with a test?

License file

Hi,

first, many thanks for this work, it really helps to make hermetic conda builds! The project does not have a license, which may prevent its use, could you select one?

Release `0.1.0`

New major changes introduced lately: miniforge support with #21 and mamba support with #22. It might be a good time to release a new version of rules_conda.

Let's follow semantic versioning from now on and start with 0.1.0.

Miniforge

Consider adding support for miniforge. It's a conda installer similar to miniconda, but it's developed by the community and has conda-forge as the default channel. Some people might still want to use the usual miniconda installation, so we should make it possible to choose (probably with miniconda being the default).

This idea was originally brought up by @jiawen in #2 (comment)

Example could not build due to the gap between release version and main branch

I see rules_conda now adds mamba support in the main branch, this is fantastic.
However the example could not be built with the following error:

❯ ./bazelw run app
Starting local Bazel server and connecting to it...
ERROR: .../rules_conda/example/WORKSPACE:64:11: //external:conda: no such attribute 'conda_version' in 'load_conda_rule' rule
ERROR: .../rules_conda/example/WORKSPACE:64:11: //external:conda: no such attribute 'mamba_version' in 'load_conda_rule' rule
ERROR: .../rules_conda/example/WORKSPACE:64:11: //external:conda: no such attribute 'install_mamba' in 'load_conda_rule' rule
ERROR: .../rules_conda/example/WORKSPACE:72:13: //external:py2_env: no such attribute 'use_mamba' in 'conda_create_rule' rule
ERROR: .../rules_conda/example/WORKSPACE:82:13: //external:py3_env: no such attribute 'use_mamba' in 'conda_create_rule' rule
ERROR: error loading package 'external': Package 'external' contains errors
FAILED: Build did NOT complete successfully (0 packages loaded)
FAILED: Build did NOT complete successfully (0 packages loaded)

This is because the example WORKSPACE loads release version of rules_conda by default rather than that from the main branch, and new features like conda_version are added into example.
In this case, there will be a gap between the release version and the main branch.

I think it might be good to turn on local_repository by default, and leave the release version as another option in the example. Since it's higher chance for people cloning the repo first and then run the example, it could make the example playing experience more smooth.

I tried git_repository as well (by using commit zip), it could help bridge the gap, but bazel would clone the whole repo (including example 26M with bazelisk binary).