Giter VIP home page Giter VIP logo

Comments (20)

dbast avatar dbast commented on September 16, 2024 1

The right place to fix the license mapping is a PR against the license code block starting here https://github.com/conda/conda-build/blob/3b99b2222a067e113a2282926871cd1e5406ee2b/conda_build/skeletons/cran.py#L1521

from conda_r_skeleton_helper.

bgruening avatar bgruening commented on September 16, 2024

@nick-youngblut nice catch! Are you going to do a PR?

from conda_r_skeleton_helper.

nick-youngblut avatar nick-youngblut commented on September 16, 2024

Sure. I can look into the code and make the change

from conda_r_skeleton_helper.

nick-youngblut avatar nick-youngblut commented on September 16, 2024

@bgruening should I add a check that the license: is one of the valid licenses:

  • Apache-2.0
  • Apache-2.0 WITH LLVM-exception
  • BSD-3-Clause
  • BSD-3-Clause OR MIT
  • GPL-2.0-or-later
  • LGPL-2.0-only OR GPL-2.0-only
  • LicenseRef-HDF5
  • MIT
  • MIT AND BSD-2-Clause
  • PSF-2.0

... or just include a regex to change license: GPL-2 to license: GPL-2.0-or-later?

from conda_r_skeleton_helper.

bgruening avatar bgruening commented on September 16, 2024

I think a check is a nice idea! Thanks!

from conda_r_skeleton_helper.

nick-youngblut avatar nick-youngblut commented on September 16, 2024

OK. I've added the GPL-2 replacement regex and an SPDX license check for both run.py and run.R: #49

from conda_r_skeleton_helper.

bgruening avatar bgruening commented on September 16, 2024

Thanks!

from conda_r_skeleton_helper.

jdblischak avatar jdblischak commented on September 16, 2024

@nick-youngblut Thanks for identifying and fixing this issue!

I think this is a good reminder though to bring up a subject we've discussed in the past: this helper script was supposed to be temporary. Ideally most of this should be upstreamed, especially the license stuff. Using regexes to edit unstructured text is so fragile. conda-forge has proved itself to be an important part of the scientific computing ecosystem. Couldn't we get a student, e.g. Google Summer of Code, to implement some flags for the skeleton script like --no-comments, --use-spdx, etc.?

from conda_r_skeleton_helper.

nick-youngblut avatar nick-youngblut commented on September 16, 2024

Given the rigidity of CRAN for R packages, I'm surprised that one cannot create a direct conversion of all CRAN packages to conda recipes. Maybe the underlying non-R dependencies make this unrealistic, except maybe with a pre-trained deep learning language model trained on all existing CRAN => conda_recipe data.

from conda_r_skeleton_helper.

jdblischak avatar jdblischak commented on September 16, 2024

Maybe the underlying non-R dependencies make this unrealistic, except maybe with a pre-trained deep learning language model trained on all existing CRAN => conda_recipe data.

It can get tricky, especially when dealing with compiled code. There are resources like remotes::system_requirements() that provide more structured data on the non-R dependencies, so a mapping of these to available conda recipes would be useful.

Though just to be clear, I'm not advocating that we attempt to get all of CRAN onto conda-forge. I'm happy with the current demand-driven model where users only add the packages that they need.

from conda_r_skeleton_helper.

nick-youngblut avatar nick-youngblut commented on September 16, 2024

Though just to be clear, I'm not advocating that we attempt to get all of CRAN onto conda-forge. I'm happy with the current demand-driven model where users only add the packages that they need.

I'm guessing this is why many people stick with CRAN for R package installation instead of using conda. If all pypi and CRAN packages were available via conda, then conda could become a universal software manager for data scientists that utilize R, python, and various command line software. The current unnecessary divide in python and R package development and management (eg., CRAN and pypi/conda, or Jupyter vs RStudio) generates unnecessary hurdles for data scientists that just want to use the best tool for the job, regardless of what language it was written in.

from conda_r_skeleton_helper.

jdblischak avatar jdblischak commented on September 16, 2024

If all pypi and CRAN packages were available via conda, then conda could become a universal software manager for data scientists that utilize R, python, and various command line software

@nick-youngblut You make a very compelling argument!

One issue is that we know we can't get to 100%, especially on Windows. We already have many examples where we can't the build to work on Windows because of missing system dependencies. To make it universal, we'd need to use a trick like the bpsm package does, where it overrides the default install.packages() function. This allows it to install Debian/Ubuntu binaries when available, and otherwise install directly from CRAN.

from conda_r_skeleton_helper.

nick-youngblut avatar nick-youngblut commented on September 16, 2024

One issue is that we know we can't get to 100%, especially on Windows.

Given that most data science is done on a linux/unix OS, and users that only have access to a Windows machine can use linux via a VM, dual boot, or free/cheap cloud-based services, why must windows be supported?

from conda_r_skeleton_helper.

jdblischak avatar jdblischak commented on September 16, 2024

why must windows be supported?

@nick-youngblut I don't recommend Windows for scientific computing, but from a purely practical standpoint, many people start their programming journey on Windows. I certainly did. If the goal is to viewed as "universal", I think we should try to support Windows as much as is reasonable. Though we are kind of getting away from my point: we're never going to get 100% of CRAN packages converted to conda (at least not on a volunteer basis). While Windows is the most problematic, there are also missing dependencies on macOS, so having a convenient way to fall back to CRAN would be ideal.

The right place to fix the license mapping is a PR against the license code block starting here

@dbast Thanks for the pointer! I recognize that code 😄 Though I think implementing SPDX identifiers is going to be more of a social issue than a technical one. The potential use of SPDX for licenses as been discussed at least as far back as 2017, e.g. conda/conda#5280, and it's never been implemented. The new grayskull replacement supports SPDX, but it's unclear when R support is going to be added. Maybe we could add a flag, e.g. --use-spdx, to the existing cran skeleton to allow the optional use of SPDX identifiers?

from conda_r_skeleton_helper.

nick-youngblut avatar nick-youngblut commented on September 16, 2024

I don't recommend Windows for scientific computing, but from a purely practical standpoint, many people start their programming journey on Windows

I get your point, but in this new age of free/cheap cloud computing, does any have to start their programming journey on their own machine? One could argue that various cloud-based services make it much easier for new programmers to get started.

Though we are kind of getting away from my point: we're never going to get 100% of CRAN packages converted to conda (at least not on a volunteer basis)

I agree, without a very sophisticated automated method, which is likely too complex to attempt right now, given the current state of AI (eg., massive amounts of training required, which still doesn't result in logical reasoning).

from conda_r_skeleton_helper.

dbast avatar dbast commented on September 16, 2024

@jdblischak Why do you think changing the mapping is a social topic? conda-forge is anyway happy with spdx and I don't think anybody else would have objections ...

@nick-youngblut one of the reasons why conda is popular is the fact, that it can create consistent environments in userspace (without admin rights) across operating systems... you can't imagine how many users in enterpise / bank companies are stuck with windows...

@ALL there is a conda-build PR that does mapping for system requirements... conda/conda-build#3826 that enables large build outs of cran packages with very little intervention. maybe somebody finds the time to finish / rebase it.

from conda_r_skeleton_helper.

jdblischak avatar jdblischak commented on September 16, 2024

Why do you think changing the mapping is a social topic?

@dbast Because the majority of the discussion in that Issue I linked to was about consensus, backwards compatibility, and possibly creating a separate license field for the SPDX identifier. And the fact that it's been 4 years and nothing has been implemented.

conda-forge is anyway happy with spdx and I don't think anybody else would have objections ...

I agree conda-forge has standardized on it, but we're not the only users of the conda-build skeletons (nor do we have write access to the conda-build repo AFAIK). That's why I suggested a flag like --use-spdx. That would allow conda-forge users to use the SPDX identifiers without breaking backwards compatibility.

you can't imagine how many users in enterpise / bank companies are stuck with windows...

I've also found myself in the situation of a locked-down Windows machine (fortunately only temporarily until I was given admin rights), and I was very grateful to be able to quickly bootstrap a working data science environment with conda.

there is a conda-build PR that does mapping for system requirements

Very cool! Thanks for bringing this to our attention. From skimming the code, it seems like it parses the SystemRequirements field, and then looks it up in a dependency mapping file. Is the idea to translate something like the existing sysreqsdb to conda packages?

from conda_r_skeleton_helper.

dbast avatar dbast commented on September 16, 2024

@jdblischak Times have changed ... if you look at feedstocks at https://github.com/AnacondaRecipes you can see that lots/most of them have spdx license strings specified for license: .... As now both sides the community and the defaults recipes use spdx, it makes sense that the skeletons receive some updates.

Yes, something like sysreqsdb ... (it would be interesting to extend sysreqdb to conda and use it inside the skeleton) ... doing then a large build out of cran would also mean to aggregate multiple recipes in one repo as done by https://github.com/AnacondaRecipes/aggregateR or bioconda... otherwise conda-forge ends up with >10k new feedstock repos.

This can be all done step by step.. the ideas, concepts and unfinished code already exist to be picked up.

from conda_r_skeleton_helper.

jdblischak avatar jdblischak commented on September 16, 2024

Times have changed

@dbast Agreed! And not only the licenses. I see you are now at Anaconda. Congrats on the new job!

otherwise conda-forge ends up with >10k new feedstock repos.

I would love if we could have fewer repos. My inbox is inundated with conda-forge notifications, and I find it overwhelming (I haven't had much luck adjusting my GitHub email notifications settings. If anyone knows of a way to receive notifications for direct mentions and not team mentions, I would love to know how to do this).

This can be all done step by step.. the ideas, concepts and unfinished code already exist to be picked up.

I'm inspired! Though I think these efforts would need some central coordination, especially if we want to move all the R packages into a single repo instead of individual feedstocks. Do you have the bandwidth to coordinate this? Maybe we start a discussion at https://github.com/orgs/conda-forge/teams/r to gauge interest and availability?

I had written off the existing cran skeleton since it was my understanding that grayskull was the future, e.g. conda/grayskull#7. But grayskull is still pre-1.0, and I see that CRAN support was added as a milestone for version 2.0: https://github.com/conda-incubator/grayskull/milestone/2 Thus it seems like it still makes sense to continue investing in improvements to the existing skeleton.

from conda_r_skeleton_helper.

dbast avatar dbast commented on September 16, 2024

@jdblischak Thanks! I am happy to help here with coordination. Let's continue the discussion at https://github.com/orgs/conda-forge/teams/r

from conda_r_skeleton_helper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.