Giter VIP home page Giter VIP logo

Comments (49)

PatrickXYS avatar PatrickXYS commented on August 22, 2024 1

Should be maintainers rather than members

from internal-acls.

Jeffwan avatar Jeffwan commented on August 22, 2024 1

@PatrickXYS training probably will deprecate Caffe2-operator soon and it doesn't have e2e test defined. This is just for debug purpose?

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024 1

@PatrickXYS you can go ahead an prepare the PR and I can LGTM it with a hold so we're ready to merge it as soon as the results of kfctl come in.

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024 1

I updated the periobolos image to the latest image: gcr.io/k8s-prow/peribolos@sha256:a1248c8793d5c99ed3b31c8ad1e27348cdaf5e11abc501629c9d636921460e9b

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024 1

https://github.com/orgs/kubeflow/teams/third-party-bots-1321/repositories

Problems solved! Upgraded periobolos fixed issue (?not sure what the issue is though)

Thanks a lot for helping @jlewi

/close

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024 1

@PatrickXYS Thanks for patience. I bet we both wish we could get those hours of our lives back.

from internal-acls.

issue-label-bot avatar issue-label-bot commented on August 22, 2024

Issue-Label Bot is automatically applying the labels:

Label Probability
area/engprod 0.60
kind/bug 0.69

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

from internal-acls.

issue-label-bot avatar issue-label-bot commented on August 22, 2024

Issue-Label Bot is automatically applying the labels:

Label Probability
area/front-end 0.56

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

from internal-acls.

Jeffwan avatar Jeffwan commented on August 22, 2024

With correct permission, then I think bot doesn't need admin permission any more.
This #357 should be discarded.

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

I'll discard #357 after we confirm proposed PR fix the issue #362

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

It's happening again to kfserving repo after merged PR #363

image

/cc @chensun I'm assuming the same sync error happened again

from internal-acls.

Jeffwan avatar Jeffwan commented on August 22, 2024

em.. any change to figure out the root problem ?

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

We might need to rely on google folks in debugging sync issue, because this is somehow related to Github Permissions, I doubt we can find easy way debugging.

from internal-acls.

chensun avatar chensun commented on August 22, 2024

It's happening again to kfserving repo after merged PR #363

image

/cc @chensun I'm assuming the same sync error happened again

The error is

{"component":"peribolos","file":"prow/cmd/peribolos/main.go:201","func":"main.main","level":"fatal","msg":"Configuration failed: failed to configure kubeflow team third-party-bots repos: failed to update team 4183788(third-party-bots) permissions on repo kfserving to write: status code 422 not one of [204], body: {\"message\":\"Validation Failed\",\"documentation_url\":\"https://docs.github.com/rest/reference/teams#add-or-update-team-repository-permissions\"}","time":"2020-10-29T17:37:31Z"}

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

@chensun Thanks, but how can we debug from this way?

From Github Doc, 422 means third-party-bots team is not owned by Kubeflow org,

You will get a 422 Unprocessable Entity status if you attempt to add a repository to a team that is not owned by the organization. 

Is this possible that we can get access to the cluster to debug? Or can we find another way to make it work, maybe manually re-syncing script

@jlewi

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

/cc @Bobgy Can you help run this script to sync manually?

This is a blocking issue for serving WG migration to Shared Test-infra.

/cc @kubeflow/wg-serving-leads

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

@PatrickXYS Why would running the script manually help? Isn't the script just executing the same thing our auto sync is running? Presumably the logs @chensun posted above are the relevant logs.

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

The group appears to be owned by kubeflow.
https://github.com/orgs/kubeflow/teams/third-party-bots

I wonder if its possible if there is a name collision and peribolos isn't handling that correctly?

We have other groups like kfserving that are getting correctly sync'd
https://github.com/orgs/kubeflow/teams/kfserving-owners/repositories

Maybe just try it by making the group id sufficiently random e.g.

kf-third-party-bots-4327

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

How did you fix that problem before with the other repositories? e.g. katib and pytorch?

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

Why would running the script manually help? Isn't the script just executing the same thing our auto sync is running?

Yes I assume they're the same, but having people running it can give us some chances to further debug, we can't help in any way if machine automatically running sync script given limited access.

Maybe just try it by making the group id sufficiently random e.g.

Got it, let me try it.

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

It would also be worth trying to verify that the group id in the error message is correct

failed to update team 4183788(third-party-bots)

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

Here's a script that should allow you to get the team id
https://github.com/jlewi/kubeflow-dev/blob/master/get-team-id.sh

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

Update: using this script to get team id, found third-party-bots is not included in Kubeflow Org given the output. But indeed it's included from UI

I'll go ahead randomize the TEAM Name to see if that works

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

/reopen

It's doesn't work https://github.com/orgs/kubeflow/teams/third-party-bots-1321/repositories

from internal-acls.

k8s-ci-robot avatar k8s-ci-robot commented on August 22, 2024

@PatrickXYS: Reopened this issue.

In response to this:

/reopen

It's doesn't work https://github.com/orgs/kubeflow/teams/third-party-bots-1321/repositories

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

Can we add the bot account to the ci-bot team? https://github.com/kubeflow/internal-acls/blob/master/github-orgs/kubeflow/org.yaml#L723-L728

Only give write access to kubeflow repos should be okay I think, wondering if folks have any concern about this?

This sync issue keeps blocking us from helping WG folks migrating to Shared Test-infra.

Besides that, I don't know what I can do to help to debug the sync issue.

/cc @Jeffwan @Bobgy @jlewi

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

using this script to get team id, found third-party-bots is not included in Kubeflow Org given the output. But indeed it's included from UI

It looks like the team account was created and two repos were added.
https://github.com/orgs/kubeflow/teams/third-party-bots-1321/repositories

Only give write access to kubeflow repos should be okay I think, wondering if folks have any concern about this?

Yes. I believe ci-bots have write access to all/most repos and the aws bots should not have access to all of them.

@chensun Can you attach the latest logs?

@PatrickXYS I think you might need to follow up with the prow team once @chensun attaches the logs

One possible work around would be for folks to fork the code into their own GitHub org. I believe @kubeflow/wg-automl-leads was already looking into setting up their own GitHub org for docker images so this might be another reason to do that.

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

Here are the logs. It looks like its the same error.

{"component":"peribolos","file":"prow/cmd/peribolos/main.go:201","func":"main.main","level":"fatal","msg":"Configuration failed: failed to configure kubeflow team third-party-bots-1321 repos: [failed to update team 4230878(third-party-bots-1321) permissions on repo tf-operator to write: status code 422 not one of [204], body: {\"message\":\"Validation Failed\",\"documentation_url\":\"https://docs.github.com/rest/reference/teams#add-or-update-team-repository-permissions\"}, failed to update team 4230878(third-party-bots-1321) permissions on repo xgboost-operator to write: status code 422 not one of [204], body: {\"message\":\"Validation Failed\",\"documentation_url\":\"https://docs.github.com/rest/reference/teams#add-or-update-team-repository-permissions\"}, failed to update team 4230878(third-party-bots-1321) permissions on repo kfctl to write: status code 422 not one of [204], body: {\"message\":\"Validation Failed\",\"documentation_url\":\"https://docs.github.com/rest/reference/teams#add-or-update-team-repository-permissions\"}, failed to update team 4230878(third-party-bots-1321) permissions on repo kfserving to write: status code 422 not one of [204], body: {\"message\":\"Validation Failed\",\"documentation_url\":\"https://docs.github.com/rest/reference/teams#add-or-update-team-repository-permissions\"}, failed to update team 4230878(third-party-bots-1321) permissions on repo manifests to write: status code 422 not one of [204], body: {\"message\":\"Validation Failed\",\"documentation_url\":\"https://docs.github.com/rest/reference/teams#add-or-update-team-repository-permissions\"}]","time":"2020-11-02T14:12:35Z"}

github-sync-logs.txt

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

This is very weird; it looks like there are now 3 repos
https://github.com/orgs/kubeflow/teams/third-party-bots-1321/repositories

  • kfserving added
  • katib added
  • pytorch-operator

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

I added the other repositories manually. Lets see what happens.

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

It looks likes the sync is still crashing which will block other updates to the GitHub org.

@PatrickXYS Can you try adding the repositories one at a time so we can narrow it down to a specific repository? Maybe try tf-operator?

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

I added the other repositories manually. Lets see what happens.

Thanks @jlewi

Can you try adding the repositories one at a time so we can narrow it down to a specific repository? Maybe try tf-operator?

This makes sense, since tf-operator has already been added, I'll add caffe2-operator to experiment

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

This makes sense, since tf-operator has already been added, I'll add caffe2-operator to experiment

If you look at the error the sync is crashing on a bunch of repos.

{"component":"peribolos","file":"prow/cmd/peribolos/main.go:201","func":"main.main","level":"fatal","msg":"Configuration failed: failed to configure kubeflow team third-party-bots-1321 repos: [failed to update team 4230878(third-party-bots-1321) permissions on repo tf-operator to write: status code 422 not one of [204], body: {\"message\":\"Validation Failed\",\"documentation_url\":\"https://docs.github.com/rest/reference/teams#add-or-update-team-repository-permissions\"}, failed to update team 4230878(third-party-bots-1321) permissions on repo xgboost-operator to write: status code 422 not one of [204], body: {\"message\":\"Validation Failed\",\"documentation_url\":\"https://docs.github.com/rest/reference/teams#add-or-update-team-repository-permissions\"}, failed to update team 4230878(third-party-bots-1321) permissions on repo kfctl to write: status code 422 not one of [204], body: {\"message\":\"Validation Failed\",\"documentation_url\":\"https://docs.github.com/rest/reference/teams#add-or-update-team-repository-permissions\"}, failed to update team 4230878(third-party-bots-1321) permissions on repo kfserving to write: status code 422 not one of [204], body: {\"message\":\"Validation Failed\",\"documentation_url\":\"https://docs.github.com/rest/reference/teams#add-or-update-team-repository-permissions\"}, failed to update team 4230878(third-party-bots-1321) permissions on repo manifests to write: status code 422 not one of [204], body: {\"message\":\"Validation Failed\",\"documentation_url\":\"https://docs.github.com/rest/reference/teams#add-or-update-team-repository-permissions\"}]","time":"2020-11-02T14:12:35Z"}

Including tf-operator. So I think we want to start adding the repos one by one; e.g. start by adding katib see if the sync works. If it does then add kfserving. Etc...

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

Sounds good, I started from top of the repo-list, which is katib first, then kfctl ...

#371

Can you approve the PR and let's see how the logs look like

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

FYI, what does log looks like now? Let me know if there's sync issue or not, and I can move forward to add other repos.

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

@PatrickXYS sync completed successfully. the team owns the katib repo.

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

Move forward with kfctl repo, PR: #373

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

Only katib access here: https://github.com/orgs/kubeflow/teams/third-party-bots-1321/repositories

Since kfserving is also actively developing, can we move fast to fix the issue?

/cc @jlewi @kubeflow/wg-serving-leads

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

@PatrickXYS Sure. Just add KFServing next.

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

kfserving PR #374

Looks like kfctl is not yet synced: https://github.com/orgs/kubeflow/teams/third-party-bots-1321/repositories

/cc @jlewi

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

kfctl should be running now

 kubectl --context=github-admin get jobs --sort-by=".metadata.creationTimestamp"
NAME                         COMPLETIONS   DURATION   AGE
label-sync-1008-002          1/1           12m        635d
github-sync-f65bw            0/1           467d       467d
github-sync-vtzcm            1/1           17s        467d
label-syncr9dgh              0/1           432d       432d
github-sync-7bw44            1/1           20s        424d
label-syncs88xh              0/1           424d       424d
github-sync-7q6ht            0/1           416d       416d
github-sync-c5577            0/1           165d       165d
label-sync-cron-1601964000   0/1           27d        27d
label-sync-cron-1601985600   0/1           27d        27d
github-sync-1604346000       0/1           128m       128m
github-sync-1604350800       1/1           14m        47m
github-sync-1604352000       1/1           14m        33m
github-sync-1604352600       1/1           14m        18m
github-sync-1604353200       0/1           4m6s       4m6s

So when github-sync-1604353200 finishes.

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

@PatrickXYS It failed to update kfctl.

{"component":"peribolos","file":"prow/cmd/peribolos/main.go:201","func":"main.main","level":"fatal","msg":"Configuration failed: failed to configure kubeflow team third-party-bots-1321 repos: failed to update team 4230878(third-party-bots-1321) permissions on repo kfctl to write: status code 422 not one of [204], body: {\"message\":\"Validation Failed\",\"documentation_url\":\"https://docs.github.com/rest/reference/teams#add-or-update-team-repository-permissions\"}","time":"2020-11-02T21:50:24Z"}

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

I removed kfctl for now and add kfserving

The error message is weird, sync works for katib but failed for kfctl, I think we need to reach out to Prow folks asking about Peribolos.

/cc @jlewi

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

@PatrickXYS there's probably a couple things we should do

  • Upgrade to the latest peribolos; I can try that
  • Take a quick look at the peribolos code to see what's going on

Its not clear to me whether this is a

  • bug in peribolos
  • bug in GitHub
  • a weird misconfiguration on our end.

We may need to do something like try to repo the issue by creating a small piece of GoCode which calls the GitHub API and tries to update the repo so we can see if the issue is in peribolos or on the GitHub side.

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

@jlewi Yeah I think we should other experiments you mentioned before we report to Prow team.

Let's see what happens after periobolos upgraded.

from internal-acls.

jlewi avatar jlewi commented on August 22, 2024

@PatrickXYS periobolos should be upgraded.
It sync'd
https://github.com/orgs/kubeflow/teams/third-party-bots-1321/repositories
So KFServing is now a repository.

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

I'm considering I can send PR for kfctl again and see if upgraded periobolos can fix the issue

kfctl PR: #377

from internal-acls.

PatrickXYS avatar PatrickXYS commented on August 22, 2024

kfctl is back https://github.com/orgs/kubeflow/teams/third-party-bots-1321/repositories

Maybe upgraded periobolos fixed the issue.

I'll send another PR to add the rest of repos all at once

from internal-acls.

k8s-ci-robot avatar k8s-ci-robot commented on August 22, 2024

@PatrickXYS: Closing this issue.

In response to this:

https://github.com/orgs/kubeflow/teams/third-party-bots-1321/repositories

Problems solved! Upgraded periobolos fixed issue (?not sure what the issue is though)

Thanks a lot for helping @jlewi

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

from internal-acls.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.