Giter VIP home page Giter VIP logo

oss-test-infra's Introduction

Copyright 2018 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

OSS PROW

oss-test-infra's People

Contributors

2nd-half avatar a-crate avatar bentheelder avatar bobgy avatar chaodaig avatar chases2 avatar chensun avatar chizhg avatar cjwagner avatar connor-mccarthy avatar ericedens avatar fejta avatar google-oss-prow[bot] avatar google-oss-robot avatar haiyanmeng avatar hopkiw avatar jaimehisao avatar jcnars avatar kchernyshev avatar koln67 avatar krzyzacy avatar michelle192837 avatar mpherman2 avatar nan-yu avatar nareddyt avatar sdowell avatar sethvargo avatar taoxuy avatar zijianjoy avatar zmarano avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oss-test-infra's Issues

approve enabled in repos without OWNERS files

Repos with this problem:

{
 insertId: "3206wog42gdrsb"  
 jsonPayload: {
  author: "adjackura"   
  component: "hook"   
  event-GUID: "a5d1a300-ebb8-11e9-9868-406a0c1c7f7f"   
  event-type: "issue_comment"   
  file: "prow/plugins/approve/approvers/owners.go:164"   
  func: "k8s.io/test-infra/prow/plugins/approve/approvers.Owners.GetSuggestedApprovers"   
  level: "warning"   
  msg: "Couldn't find/suggest approvers for each files. Unapproved: [""]"   
  org: "GoogleCloudPlatform"   
  plugin: "approve"   
  pr: 2   
  repo: "guest-agent"   
  url: "https://github.com/GoogleCloudPlatform/guest-agent/pull/2#issuecomment-540843760"   
 }
 labels: {…}  
 logName: "projects/oss-prow/logs/hook"  
 receiveTimestamp: "2019-10-10T23:49:58.731720740Z"  
 resource: {…}  
 severity: "ERROR"  
 timestamp: "2019-10-10T23:49:53Z"  
}

unable to rerun jobs

we return a 200 You don't have permission to rerun that job when I click rerun.

Autobump fails

/assign @cjwagner

https://prow.gflocks.com/view/gcs/oss-prow/logs/ci-test-infra-autobump-prow/1182074783416717312

Bumping: 'gcr.io/k8s-prow/' images to v20191009-0492f1d6e ...
Attempting to bump the following files: prow/cluster/cert-manager.yaml prow/cluster/cluster.yaml prow/cluster/grandmatriarch_default.yaml prow/cluster/grandmatriarch_test-pods.yaml prow/config.yaml prow/prowjobs/GoogleCloudPlatform/oss-test-infra/gcp-oss-test-infra-config.yaml
bump.sh completed successfully!
[master 7ac2309] Bump prow from v20191008-bbd30bc09 to v20191009-0492f1d6e
 5 files changed, 15 insertions(+), 15 deletions(-)
Pushing commit to google-oss-robot/oss-test-infra:autobump...
remote: Repository not found.
fatal: repository 'https://google-oss-robot:@github.com/google-oss-robot/oss-test-infra/' not found

Setup Prow job rerun

image

It would be great to expose this feature to existing prow users. The infrastructure is in place - it just needs to be configured.

Tasks:

  • A GoogleCloudPlatform owner to create a Github Oauth App (step 1 and 2)
  • Define which users or teams will be delegated rerun permission.
  • Update the deck Kubernetes deployment and secrets (github-oauth-config and cookie)., #100

Create a dedicate build cluster

Create a dedicate build cluster

This allows for isolation between the Prow cluster and the jobs running on it. Furthermore, this enables the introduction of "trusted" jobs for general deployments, image builds, and prow bumps in order to automate repeatable, mundane tasks without exposing the cluster (or secrets) to unprivileged users.

  • Create a GCP Prow build cluster.
  • Migrate jobs to the build cluster.
  • Set the build cluster as the default cluster.

/assign

Enable ghproxy for Github API calls

Enable ghproxy for Github API calls to limit token usage.

#100

@cjwagner - We'll probably want to set up ghproxy for this cluster since listing the member of this team currently costs 25 API ratelimit tokens. That being said, the cache entry would become invalid pretty quickly as I suspect that team has high churn and this bot token isn't heavily utilized.

/assign

Cannot list pods in kubeflow cluster

{
 insertId: "1e8iuueg2rjyho6"  
 jsonPayload: {
  component: "plank"   
  error: "errors syncing: [error listing pods in cluster "kubeflow": Get https://35.196.213.148/api/v1/namespaces/test-pods/pods?labelSelector=created-by-prow%3Dtrue: dial tcp 35.196.213.148:443: connect: connection refused], errors reporting: []"   
  file: "prow/cmd/plank/main.go:149"   
  func: "main.main.func2"   
  level: "error"   
  msg: "Error syncing."   
 }
 labels: {…}  
 logName: "projects/oss-prow/logs/plank"  
 receiveTimestamp: "2020-01-21T19:02:28.929499401Z"  
 resource: {…}  
 severity: "ERROR"  
 timestamp: "2020-01-21T19:02:24Z"  
}

/kind oncall-hotlist

plank.allow_cancellations is deprecated

{
 insertId: "wqxj9cg6l82b79"  
 jsonPayload: {
  component: "tide"   
  file: "prow/config/config.go:1258"   
  func: "k8s.io/test-infra/prow/config.parseProwConfig"   
  level: "warning"   
  msg: "The `plank.allow_cancellations` setting is deprecated. It will be removed and set to always true in March 2020"   
 }
 labels: {…}  
 logName: "projects/oss-prow/logs/tide"  
 receiveTimestamp: "2019-12-04T22:56:59.879872048Z"  
 resource: {…}  
 severity: "ERROR"  
 timestamp: "2019-12-04T22:56:55Z"  
}

/kind oncall-hotlist

Use workload identity for prow

We should stop sending this flag:

- --gcs-credentials-file=/etc/robot/service-account.json

We already define a service account:

serviceAccountName: gerrit

We should annotate it with GCP SA rights: https://github.com/kubernetes/test-infra/tree/master/experiment/workload-identity

If the flag is unset then it will use default creds:
https://github.com/kubernetes/test-infra/blob/master/prow/cmd/gerrit/main.go#L240

https://github.com/kubernetes/test-infra/blob/164c5e85105f85bb8fd8c181a085911a3d1010fd/pkg/io/opener.go#L57-L60

https://godoc.org/cloud.google.com/go/storage#hdr-Creating_a_Client

Work:

  • Migrate gerrit - #210
  • Migrate grandmatriach - #224 #226
  • Migrate pod utils
  • Migrate testgrid image upload - #219
  • Migrate prow upgrade - #221
  • Migrate gcp-guest - (issue #223)
  • Migrate espv2 - issue #222 - #227

Spyglass no longer shows build-log.txt output

Starting today, users trying to view prow jobs via the 'gubernator' interface report seeing only the main interface elements (logo, job history, artifacts link etc.) followed by a blank black page, where previously the output of build-log.txt was present. The build log is still discoverable through the artifacts link.

testgrid config check failing for `ci-test-infra-resultstore-upload`

This is blocking a PR for esp-v2: #317

==================== Test output for //config/tests/testgrids:go_default_test:
--- FAIL: TestKubernetesProwInstanceJobsMustHaveMatchingTestgridEntries (3.20s)
    config_test.go:436: Job ci-test-infra-resultstore-upload does not have a matching testgrid testgroup

This was working earlier in the day. I don't know how it started failing, as there were no other PRs (other than esp-v2).

Add Knative build controller

What would you like to be added:
Add knative build controller to oss-test-infra prow instance.
Why is this needed:
We want to use knative build step functionality

RBAC rules are incorrect

#159 (comment)

textPayload: "error retrieving resource lock default/prow-sinker-leaderlock: configmaps "prow-sinker-leaderlock" is forbidden: User "system:serviceaccount:default:sinker" cannot get resource "configmaps" in API group "" in the namespace "default"

textPayload: "external/io_k8s_client_go/tools/cache/reflector.go:98: Failed to watch *v1.ProwJob: unknown (get prowjobs.prow.k8s.io)"

Use ghProxy by May 2020

Getting this error on startup:

jsonPayload: {
  component: "tide"   
  file: "prow/flagutil/github.go:90"   
  func: "k8s.io/test-infra/prow/flagutil.(*GitHubOptions).Validate"   
  level: "error"   
  msg: "It doesn't look like you are using ghproxy to cache API calls to GitHub! This has become a required component of Prow and other components will soon be allowed to add features that may rapidly consume API ratelimit without caching. Starting May 1, 2020 use Prow components without ghproxy at your own risk! https://github.com/kubernetes/test-infra/tree/master/ghproxy#ghproxy" 

crier failing to report kunit-test-presubmit

{
 insertId: "1io8wcxf6ejq1m"  
 jsonPayload: {
  component: "crier"   
  error: "error setting status: status code 404 not one of [201], body: {"message":"Not Found","documentation_url":"https://developer.github.com/v3/repos/statuses/#create-a-status"}"   
  file: "prow/crier/controller.go:279"   
  func: "k8s.io/test-infra/prow/crier.(*Controller).processNextItem"   
  jobName: "b3623ad8-c82f-11e9-b096-0a580a2c044d"   
  jobStatus: {
   build_id: "1166055611327057920"    
   completionTime: "2019-08-26T18:43:06Z"    
   description: "Job succeeded."    
   pod_name: "b3623ad8-c82f-11e9-b096-0a580a2c044d"    
   prev_report_states: {…}    
   startTime: "2019-08-26T18:31:21Z"    
   state: "success"    
   url: "https://prow.gflocks.com/view/gcs/oss-prow/pr-logs/pull/kunit-review.googlesource.com_linux/2370/kunit-test-presubmit/1166055611327057920"    
  }
  level: "error"   
  msg: "failed to report job"   
  prowjob: "default/b3623ad8-c82f-11e9-b096-0a580a2c044d"   
 }
 labels: {…}  
 logName: "projects/oss-prow/logs/crier"  
 receiveTimestamp: "2019-09-28T00:07:44.841801148Z"  
 resource: {…}  
 severity: "ERROR"  
 timestamp: "2019-09-28T00:07:41Z"  
}

That's an odd url

/kind oncall-hotlist

Gerrit adapter fails to query changes for project `linux`.

Example:

{
  component: "gerrit"   
  error: "failed to query gerrit changes: API call to https://kunit-review.googlesource.com/a/changes/?n=5&o=CURRENT_REVISION&o=CURRENT_COMMIT&o=CURRENT_FILES&o=MESSAGES&q=project:linux failed: 403 Forbidden"   
  file: "prow/gerrit/client/client.go:276"   
  func: "k8s.io/test-infra/prow/gerrit/client.(*gerritInstanceHandler).queryAllChanges"   
  level: "error"   
  msg: "fail to query changes for project linux"   
}

Interestingly, these errors happen every minute, but only between the 42nd and 49th minutes of each hour.
Outside of this window queries appear to succeed. Example:

{
  component: "gerrit"   
  file: "prow/gerrit/client/client.go:315"   
  func: "k8s.io/test-infra/prow/gerrit/client.(*gerritInstanceHandler).queryChangesForProject"   
  level: "info"   
  msg: "Find 5 changes from query [project:linux]"   
}

/kind bug

Replace Make with Bazel

Currently, this repo uses Makefiles to perform build and deploy operations. We would like to replace each make target with its equivalent bazel command.

/help

Tide is failing to set status in GoogleCloudPlatform/compute-image-tools

Example:

{
  component: "tide"   
  controller: "status-update"   
  error: "status code 404 not one of [201], body: {"message":"Not Found","documentation_url":"https://developer.github.com/v3/repos/statuses/#create-a-status"}"   
  file: "prow/tide/status.go:393"   
  func: "k8s.io/test-infra/prow/tide.(*statusController).setStatuses.func1"   
  level: "error"   
  msg: "Failed to set status context from "PENDING" to "pending"."   
  org: "GoogleCloudPlatform"   
  pr: 1150   
  repo: "compute-image-tools"   
  sha: "1b3f5263ce6c6b6984dbc5539f157c837dabeae3"   
}

Since this is only happening on the compute-image-tools repo I suspect that the repo setting may have been accidentally changed to cause this. Perhaps the bot's permissions were demoted or removed?
cc @fejta @hopkiw

Gerrit fails to replace pod on bump

After running make -C prow deploy, the other pods updated, but gerrit is stuck in ContainerCreating mode. The new pod won't spin up, and the old pod won't terminate.

NAME                                  READY   STATUS              RESTARTS   AGE
pod/crier-75b645fbf9-lwcrv            1/1     Running             0          9m1s
pod/deck-795595cf75-skpfj             1/1     Running             0          9m2s
pod/deck-795595cf75-xlnv2             1/1     Running             0          9m2s
pod/gerrit-cf9cccc74-2xjr8            0/1     ContainerCreating   0          9m1s
pod/gerrit-dc4987ccb-r555v            1/1     Running             0          26h
pod/grandmatriarch-6c54cb845c-99fph   1/1     Running             0          8m59s
pod/hook-69548ff9bb-7p9kk             1/1     Running             0          9m
pod/hook-69548ff9bb-ddm9m             1/1     Running             0          9m1s
pod/horologium-558f7c4596-cdr6q       1/1     Running             0          9m2s
pod/plank-6d5f544979-m62qq            1/1     Running             0          9m3s
pod/sinker-7d44548f88-bd4jw           1/1     Running             0          9m3s
pod/tide-69ddd6c949-bw59j             1/1     Running             0          9m

Add support for testgrid annotations on prowjobs

Can we get your utility set up on this repo to a) automatically create/update a testgrid.yaml based on the annotations on prowjobs in this repo and b) create kubernetes/test-infra PRs to get this added to testgrid.k8s.io?

/assign @chases2

Use workload identity for esp-v2

/assign @hopkiw

ref #202

  • Run enable-workload-identity.sh on the build cluster
  • Run bind-service-accounts.sh for the service account pair
    • secretName: service-account
    • secretName: cloudesf-testing-github-prow-service-account
  • Ensure job does not depend on the secret.json file existing, or the GOOGLE_APPLICATION_CREDENTIALS environment variable being set
  • Update job to use a serviceAccountName, stop using the secret (example: #221)

https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity

Use crier to report to github

I'm seeing

{
 insertId: "1fm21jlg7mcygnw"  
 jsonPayload: {
  component: "plank"   
  file: "prow/flagutil/github.go:107"   
  func: "k8s.io/test-infra/prow/flagutil.(*GitHubOptions).GitHubClientWithLogFields"   
  level: "warning"   
  msg: "empty -github-token-path, will use anonymous github client"   
 }
 labels: {…}  
 logName: "projects/oss-prow/logs/plank"  
 receiveTimestamp: "2019-12-05T22:09:39.247181734Z"  
 resource: {…}  
 severity: "ERROR"  
 timestamp: "2019-12-05T22:08:53Z"  
}

messages

See istio/test-infra#1976 for something similar

/kind oncall-hotlist

Migrate to default_decoration_configs

{
 insertId: "1l5t5s7fulrh9c"  
 jsonPayload: {
  component: "hook"   
  file: "prow/config/config.go:986"   
  func: "k8s.io/test-infra/prow/config.(*Config).finalizeJobConfig"   
  level: "warning"   
  msg: "default_decoration_config will be deprecated on April 2020, and it will be replaced with default_decoration_configs['*']."   
 }
 labels: {…}  
 logName: "projects/oss-prow/logs/hook"  
 receiveTimestamp: "2019-10-15T18:10:06.542949020Z"  
 resource: {…}  
 severity: "ERROR"  
 timestamp: "2019-10-15T18:10:05Z"  
}

/assign
/kind oncall-hotlist

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.