Giter VIP home page Giter VIP logo

Comments (23)

Bobgy avatar Bobgy commented on August 22, 2024 3

I've added webhooks to the katib and pytorch operator repos as @PatrickXYS DMed.
Please verify if that works.

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024 2

Thanks @Bobgy @andreyvelich

Here is the action item that is required to get started with two small repos:

  1. Give permission to aws-kf-ci-bot with this PR #353
  2. Add webhook to pytorch-operator and katib, since this part is kind-of secret, I'll DM you with specific steps.

Created a P0 ticket to keep track of webhook management process #354

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024 2

Relying on @Bobgy to administer all the repos doesn't seem very sustainable. I can think of two viable paths

  1. Someone builds out the automation
  2. We pass on the toil for manual maintenance to individual repos

Regarding #2 as a possible interim solution; What privileges are needed to configure webhooks on the repository level? Is it admin?

I don't think we want to be handing out admin rights as a general rule as that allows for destructive actions. However, as an interim solution that might be ok and preferable to putting all of this toil on @Bobgy.

For the granting of admin rights to be an interim solution; I think we would have to meet the following conditions

i. There is a finite list of repos and individuals that will be granted admin rights

  • Once agreed upon future repos will not be added
    ii. There should be an agreed upon timeline for when these repos will be removed (3 months?)
    iii. After the 3 months privileges will be removed
    iv. Repo owners are responsible for ensuring the automation is ready within the time frame or else there tests might break once privileges are removed
    v. Anyone granted admin rights agrees to use them exclusively to manage webhooks in order to configure CI.

from internal-acls.

Bobgy avatar Bobgy commented on August 22, 2024 2

Thanks @jlewi for the proposal! I think one time configuration for two repos is okay for me, because webhook config is fairly simple, but as I imagine, there could be back and forces debugging the webhook and additional changes needed from time to time (when developing the new test infra). If that happens, I think it will be a good timing to grant temporary admin privileges as @jlewi suggested.

Thoughts?

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024 1

I would much prefer at this point to use GitHub Apps as opposed to bot accounts and separate configuration of webhooks.

The only blocking issue is upstream work on Prow side to support it. I will have a check on the existing works done by Prow and see how far to support it

We want to rely on upstream prow for development work and features, then on our side, we made a little customization to make it work with AWS.

I'd like to suggest an alternative and understand if its unreasonable before continuing down the road of not using GitHub Apps.

Bypass prow for the triggering and execution of jobs.

Kubeflow tests currently work on triggering Tekton workflows described in prow_config.yaml.

How hard would it be to just create a GitHub App that does this? Most of the code we need is in run_e2e_workflow.py The missing pieces are

  1. A server to accept webhooks
  2. Calling out to the GitHub API to report status

Is this more expensive then the ongoing cost of manually configuring bots and webhooks and then migrating to apps later?

I don't think writing a go program to do the above is to difficult; the links above have example code for talking to GitHub APIs using GitHub credentials which is the hardest part. The program could shell out to run_e2e_workflow.py so we don't need to rewrite run_e2e_workflow.py.

This would also make it alot easier for people to setup their own test infra because I think all they need is Tekton + this app.

from internal-acls.

Bobgy avatar Bobgy commented on August 22, 2024 1

@andreyvelich Can you inform me what kind of configurations I need to do and to which repos?
Shall we start small with just one repo, verify this works e2e and you can scale to multiple repos when you finish the automation work?

from internal-acls.

Jeffwan avatar Jeffwan commented on August 22, 2024 1

Once bot has the permission to operate repo, then I can help on pytorch e2e tests. Test case is ready actually, we just need to test it e2e Jeffwan/pytorch-operator#4

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024 1

So for the initial testing stage, there might be a back-and-forth process when debugging.

I would recommend granting myself with admin access to katib + pytorch-operator repo.

And this will be temporary permission granting and it also saves efforts for @Bobgy .

/cc @jlewi @Bobgy @Jeffwan @andreyvelich

from internal-acls.

issue-label-bot avatar issue-label-bot commented on August 22, 2024

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/feature 0.78

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

/cc @Jeffwan @jlewi @Bobgy @andreyvelich @yuzisun

Any insights on this issue?

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

What about using a GitHub App to report back status on PRs?
https://developer.github.com/apps/differences-between-apps/

Do we need write permission just to update GitHub status?

It looks like a GitHub app can have access to just commit status which is what we need.

GitHub Apps are also associated with webhooks.

So one possibility would be

You create a GitHub App.
We install that GitHub App for the repos that are using your test infra.

Prow has an open issue kubernetes/test-infra#10423 about creating a GitHub app; it looks like there has been a little bit of activity.

We have go code here that might be useful for authenticating as a GitHub App
https://github.com/kubeflow/testing/blob/master/go/cmd/prctl/main.go

This code is creating PRs

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

I've already taking a look at the Github Apps Prow and tried it. To make it work, we need to maintain a customized prow.

We want to rely on upstream prow for development work and features, then on our side, we made a little customization to make it work with AWS.

Do you think we can manually handle permissions / webhook in POC stage? We can improve the process gradually.

@jlewi

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

@kubeflow/google-admins will be the folks administering the GitHub org; so they would be the ones that would have to do any manual work. So you'll need to get buy in from a majority of them.

from internal-acls.

issue-label-bot avatar issue-label-bot commented on August 22, 2024

Issue-Label Bot is automatically applying the labels:

Label Probability
area/engprod 0.86

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

from internal-acls.

Jeffwan avatar Jeffwan commented on August 22, 2024

Github Apps will be better option. See benefits here. https://docs.github.com/en/free-pro-team@latest/developers/apps/migrating-oauth-apps-to-github-apps

The only blocking issue is upstream work on Prow side to support it. I will have a check on the existing works done by Prow and see how far to support it

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

So currently action item should be:

  1. Add aws-kf-ci-bot account into bot team-list

  2. Add webhook to the repos that want to migrate to Kubeflow Shared Test-infra.

@Bobgy @rmgogogo @james-jwu @neuromage

Since you are all https://github.com/orgs/kubeflow/teams/google-admins, want to gather some feedback from you, let me know if any concerns

from internal-acls.

andreyvelich avatar andreyvelich commented on August 22, 2024

@jlewi Since we have to set new test infra until 17th of October, my suggestion is to manually update webhook for couple repositories.
Then we can start to migrate infra to AWS and unblock many PRs for Training WG.
That should be easier than creating new process and debug it.

As well, we don't have much time until Kubeflow 1.2 release and, I think, it would be great if we can migrate to the new infra before that.
In the meantime, we can create ticket with p0 priority for automation process of setting the webhooks and start work for it after the release.

What do you think @Bobgy @PatrickXYS @kubeflow/wg-training-leads ?

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

@andreyvelich I will not be the one manually administering the GitHub org and setting up webhooks so you'll need to get buy in from one of the other admins; otherwise this is not a viable path forward.

from internal-acls.

andreyvelich avatar andreyvelich commented on August 22, 2024

@Bobgy or anyone from @kubeflow/google-admins Can you help us with set the webhook, please?
/cc @Jeffwan @PatrickXYS

from internal-acls.

andreyvelich avatar andreyvelich commented on August 22, 2024

@Jeffwan @PatrickXYS Can you explain @Bobgy what change needs to be done to set the webhook, please?

My suggestion is to set webhook for repositories that have many blocking PRs before the Kubeflow 1.2 release.
From my point of view it is: pytorch-operator and katib.

@kubeflow/wg-training-leads What do you think ?

from internal-acls.

terrytangyuan avatar terrytangyuan commented on August 22, 2024

Setting webhooks for those two repos sounds good to me.

from internal-acls.

johnugeorge avatar johnugeorge commented on August 22, 2024

SGTM

from internal-acls.

Jeffwan avatar Jeffwan commented on August 22, 2024

I think POC is done and it works fine. @PatrickXYS You can revert changes in #357
If other repos plans to onboard new CI, I would suggest to apply the change once.

from internal-acls.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.