jenkinsci / google-compute-engine-plugin Goto Github PK

View Code? Open in Web Editor NEW

56.0 9.0 83.0 1.4 MB

Home Page: https://plugins.jenkins.io/google-compute-engine/

License: Apache License 2.0

Makefile 0.26% Java 90.02% HTML 8.11% PowerShell 1.32% Shell 0.28%

google-cloud

google-compute-engine-plugin's Issues

Allow selecting image family only

A GCE image family will point to the latest image.

GCE plugin users should be able to select only the image project and image family instead of the image family and specific image name.

For example, the image family ubuntu-1804-lts points to ubuntu-1804-bionic-v20190514.

$ gcloud compute images describe-from-family ubuntu-1804-lts --project=ubuntu-os-cloud
archiveSizeBytes: '9522384640'
creationTimestamp: '2019-05-14T20:02:56.234-07:00'
description: Canonical, Ubuntu, 18.04 LTS, amd64 bionic image built on 2019-05-14
diskSizeGb: '10'
family: ubuntu-1804-lts
guestOsFeatures:
- type: VIRTIO_SCSI_MULTIQUEUE
id: '8613129354617438128'
kind: compute#image
labelFingerprint: 42WmSpB8rSM=
licenseCodes:
- '5926592092274602096'
licenses:
- https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/licenses/ubuntu-1804-lts
name: ubuntu-1804-bionic-v20190514
rawDisk:
  containerType: TAR
  source: ''
selfLink: https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/images/ubuntu-1804-bionic-v20190514
sourceType: RAW
status: READY

Code Cleanup: Remove exposed compute client in ComputeEngineCloud

There are examples throughout the code base of accessing the ComputEngineCloud's compute client through an exposed field. This introduces tight-coupling by introducing an undocumented dependence on the initialization logic/timing within the ComputeEngineCloud. Ideally this field should be encapsulated and current reference sites should be refactored to accessing the compute client in an abstract and uniform manner.

Need to investigate if the fields need to be public because of the abstract classes from Jenkins.
Original issue

Cannot launch GCE agents when plugin configured with JCasC

Hi all,

When configuring this plugin using JCasC, GCE agent VMs will not launch. The relevant fields seems to be populated in the Jenkins 'Configure System' UI, but the VMs are not able to launch until Jenkins' configuration is saved using the UI.

Here is my JCasC configuration relating to this plugin. In my case, I'm creating a brand new Jenkins instance from scratch, as you might do when running the Jenkins master in a Docker container.

jenkins:
  clouds:
  - computeEngine:
      cloudName: gce-jenkins-build
      projectId: gce-jenkins
      instanceCapStr: 1
      credentialsId: gce-jenkins
      configurations:
      - namePrefix:         jenkins-agent-image
        description:        Jenkins agent
        launchTimeoutSecondsStr: 6
        retentionTimeMinutesStr: 300
        mode:               EXCLUSIVE
        labelString:        jenkins-agent
        numExecutorsStr:    1
        runAsUser:          jenkins
        remoteFs:           '' # tried not setting this, field added when 'save' clicked in UI
        windows:            false
        windowsPasswordCredentialsId: ''    # tried not setting, added when saved in UI
        windowsPrivateKeyCredentialsId: ''  # tried not setting, added when saved in UI
        oneShot:            true
        createSnapshot:     false
        region:             "https://www.googleapis.com/compute/v1/projects/gce-jenkins/regions/europe-west1"
        zone:               "https://www.googleapis.com/compute/v1/projects/gce-jenkins/zones/europe-west1-a"
        template:           '' # tried not setting, added when 'saved' in UI
        machineType:        "https://www.googleapis.com/compute/v1/projects/gce-jenkins/zones/europe-west1-a/machineTypes/n1-standard-2"
        preemptible:        false
        minCpuPlatform:     '' # tried not setting, added when 'saved' in UI
        startupScript:      '' # tried not setting, added when 'saved' in UI
        networkConfiguration:
          sharedVpc:
            projectId:      gce-jenkins-cloud-123456
            region:         europe-west1
            subnetworkShortName: gce-jenkins-cloud
        networkTags:        jenkins-agent
        externalAddress:    true
        useInternalAddress: false
        bootDiskSourceImageProject: gce-jenkins
        bootDiskSourceImageName: "https://www.googleapis.com/compute/v1/projects/gce-jenkins/global/images/gce-jenkins-build-image"
        bootDiskType:       "https://www.googleapis.com/compute/v1/projects/gce-jenkins/zones/europe-west1-a/diskTypes/pd-standard"
        bootDiskSizeGbStr:  50
        bootDiskAutoDelete: true
        serviceAccountEmail: '[email protected]'

I did a comparison of config.xml before and after hitting save in the UI, and there's no difference in the GCE plugin configuration section, but Jenkins is suddenly able to launch GCE VMs.

(For the fields marked with '#' above, I initially tried leaving out that configuration entirely from the JCasC configuration, but it got added to Jenkins' config.xml when I hit 'save' in the Jenkins 'Configure System' UI (and still had the same problem of no VMs launching). In order to minimise the diff between the state of config.xml before an after hitting save in the UI, I added it to the JCasC configuration.)

As noted by @devqore in this JCasC issue Google Compute nodes are also being disconnected during a JCasC configuration reload - if Jenkins has more than one permanent node configured in addition to the GCE nodes.

Make path to java executable configurable

Is The path to the java executable is used hard-coded, see https://github.com/jenkinsci/google-compute-engine-plugin/blob/master/src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineLinuxLauncher.java#L130 and https://github.com/jenkinsci/google-compute-engine-plugin/blob/master/src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineWindowsLauncher.java#L129. Is is expected to find java on the $PATH.

Feature request Please make this path (optional) configurable.

Use cases

Java is installed on the instance to use as a Jenkins slave, but is not on $PATH (my case)
There are more than one Javas installed on the instance, and not the default / a specific one should be used

Link a running worker to the instance config that created it

After a worker has been created, it's not obvious which specific instance configuration created it. On create, we should either set the instance config as a property of the worker (via the Instance or Computer class), or consider giving each instance config a guid when it is created and setting that guid on the GCE instance metadata for the worker when it is created.

See parent issue

Allow way to check if instance is ready before starting agent

I am doing some post-boot stuff with cloud-init and startup scripts to prepare my instance for Jenkins (mounting NVMe disks and post-boot Ansible playbook). I'm actually wanting to store Jenkins workspace data on the NVMe SSD for performance reasons which means I need to wait for this to complete before the plugin can copy and start the agent.jar.

In 3.0.0 I was able to hack around this by creating a custom /usr/local/bin/java script that was picked up when the instance installer first checked output of java -version. The wrapper script would artificially hang until the Ansible playbook had completed (to Jenkins it just appears that the command returns very slowly).

This issue is a feature request to have a way to wait for the instance to be ready via external method. The simple fix to enable the hack again is to just move the java -version check before the agent.jar copy. But I actually think longer term it might make sense to add a check for the cloud-init Final stage using cloud-init status --wait or the /var/lib/cloud/instance/boot-finished file. This would likely account for my scenario and others where people bootstrap instances using cloud-init where there might be stuff required before the agent can actually run jobs.

One-Shot not functioning

Hey there, i've noticed recently that the one shot feature is not functioning correctly and my jobs are all trying to load onto the same instance, despite the flag being enabled.

Fail to create agents with GPU attached from Machine Configuration in Jenkins settings UI. Instances with guest accelerators do not support live migration.

Hello, it seems that it is possible to create GPU agent by specifying instance template for creating instances. But it is not possible to create GPU agent without specifying instance template. See related issue https://issues.jenkins-ci.org/browse/JENKINS-52708.

As workaround you can use following dkozlov@7b7af84

Could you please disable GPU support in Machine configuration UI or fix it


Provisioning node from config com.google.jenkins.plugins.computeengine.InstanceConfiguration@3bafb6a8 for excess workload of 1 units of label 'jenkins-gpu'

Apr 08, 2019 5:23:23 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud availableNodeCapacity

Found capacity for 99 nodes in cloud 

Apr 08, 2019 5:23:24 AM WARNING com.google.jenkins.plugins.computeengine.ComputeEngineCloud provision

Error provisioning node
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "Instances with guest accelerators do not support live migration.",
    "reason" : "badRequest"
  } ],
  "message" : "Instances with guest accelerators do not support live migration."
}
	at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1067)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
	at com.google.jenkins.plugins.computeengine.client.ComputeClient.insertInstance(ComputeClient.java:374)
	at com.google.jenkins.plugins.computeengine.InstanceConfiguration.provision(InstanceConfiguration.java:319)
	at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.provision(ComputeEngineCloud.java:203)
	at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715)
	at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
	at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:62)
	at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

[Code Cleanup]: Properly handle failed compute client setup.

The ComputeEngineCloud currently does not properly handle the case when the ComputeClient failed to be setup.

[Bug]: instanceId missing with release 3.1.0

Reported on #gcp-jenkins.
This issue appears to be breaking the plugin for customers

Intermittent key exchange error and GCP "Internal Error" 13 on machine launch

We have been using this plugin in Jenkins installed from this guide successfully for some time now. However recently, about 75% of the machines launched automatically by Jenkins are not successfully coming online, and leading to errors shown in the GCP logs.

We are seeing that when the launches fail, the Jenkins node logs show multiple

Failed to connect via ssh: The connect() operation on the socket timed out.

errors before the machine is terminated. We have also been intermittently been seeing "Internal Error" 13 in GCP logs, and 404 errors when Jenkins is trying to delete instances that were otherwise terminated in GCP.

We are unable to reproduce by launching machines manually with the same machine templates in GCP and connecting them to Jenkins by hand - this succeeds every time.

Instance templates don't support "Use Internal IP?"

I wanted to use instance templates when setting up a new Jenkins instance, but I've found that the "Use Internal IP?" appears to be ignored when they're used.

That option should probably be moved out of the "Advanced" section, so it can still be toggled even when using instance templates.

de-duplicate integration tests

Currently there exists two sets of duplicate integration tests for Linux and Windows. This creates an additional maintenance burden as when tests need to be updated, they have to be updated in both sets. Ideally, there should be one set of integration tests that can take different configurations as parameters (which could help distinguish which OS we're testing, etc).

Also, might move these out of ComputeEngineCloud since these don't really test the cloud; they test more of the configuration. Will need to parameterize some shared constants (util file).

Reference issue

${WORKSPACE} is now /tmp/workspace/<job>

I don't know when or how this changed from ./.jenkins-slave but it is now copying to this directory.
i have tried manually defining /home/ubuntu/.jenkins-slave but this fails as the directory does not exist

Instance cap can be exceeded when multiple Google Clouds configured

I have this configuration:

and yet somehow Jenkins decided to fire up 3 instances?

I have a second Google Cloud configured, with an instance cap of 2 for the linux agents.

Unable to connect to GCE using plugin via ssh

[Migrated from: https://issues.jenkins-ci.org/browse/JENKINS-56988]

I use the plugin to create a slave node on GCE, but the node is always offline.

I saw the log is below. SSH cannot connect to GCE.

just before slave jenkins-slave-916wdt gets launched ...
executing pre-launch scripts ...
Apr 12, 2019 2:31:47 AM null
FINEST: Instance jenkins-slave-916wdt is running and ready...
Apr 12, 2019 2:31:47 AM null
INFO: Launching instance: jenkins-slave-916wdt
Apr 12, 2019 2:31:54 AM null
INFO: bootstrap
Apr 12, 2019 2:31:54 AM null
INFO: Getting keypair...
Apr 12, 2019 2:31:54 AM null
INFO: Using autogenerated keypair
Apr 12, 2019 2:31:54 AM null
INFO: Authenticating as jenkins
Apr 12, 2019 2:31:55 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:31:56 AM null
INFO: Failed to connect via ssh: There was a problem while connecting to 35.229.250.191:22
Apr 12, 2019 2:31:56 AM null
INFO: Waiting for SSH to come up. Sleeping 5.
Apr 12, 2019 2:32:01 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:32:01 AM null
INFO: Failed to connect via ssh: There was a problem while connecting to 35.229.250.191:22
Apr 12, 2019 2:32:01 AM null
INFO: Waiting for SSH to come up. Sleeping 5.
Apr 12, 2019 2:32:06 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:32:07 AM null
INFO: Connected via SSH.
Apr 12, 2019 2:32:07 AM null
INFO: connect fresh as root
Apr 12, 2019 2:32:07 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:32:07 AM null
INFO: Connected via SSH.
Apr 12, 2019 2:32:07 AM null
INFO: Copying agent.jar to: /tmp
Apr 12, 2019 2:32:09 AM null
INFO: Verifying: java -fullversion
bash: java: command not found
Apr 12, 2019 2:32:09 AM null
WARNING: Java is not installed.
Apr 12, 2019 2:32:09 AM null
INFO: Launching Jenkins agent via plugin SSH: java -jar /tmp/agent.jar
Apr 12, 2019 2:32:09 AM null
WARNING: Error getting exception Exception: java.io.IOException: SSH channel is closed
Options

Migrate Issues and Docs to GitHub

Migrate Issues from Jira to GitHub: https://issues.jenkins-ci.org/browse/JENKINS-56331?jql=project%20%3D%20JENKINS%20AND%20component%20%3D%20google-compute-engine-plugin%20
Leave a comment on each issue with a link to the corresponding GitHub issue, and then close the issue in Jira.
Create a guide-post issue template in Jira directing folks to GitHub issues
Migrate the docs from the Jenkins Wiki to GitHub MD: https://wiki.jenkins.io/display/JENKINS/Google+Compute+Engine+Plugin
Leave a guide-post link from the wiki pointing to the GitHub docs.

Triggering many (100+) concurrent builds causes some builds to not get parameters set

Environment :

Jenkins ver. 2.150.2 running in docker on an ubuntu 18.04 host in GCP
Google compute engine plugin 2.0.0 (latest available)

Using an instance template to spin up n1-standard-4 VMs using SSDs

Jenkins master process is started with the following JVM Options for faster response when workers are needed. (Still experimenting with the right values).

JAVA_OPTS="-Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85"

Description of Problems

The problem is that certain job runs are not passed the parameters they should be getting and they fail.

This seems to happens more when 100 or more job runs are triggered concurrently by a trigger. Sometimes with as little as 50.

The failed jobs seem to get an executor on a worker, and start running but fail because the parameters they should have received aren't there. So the processing that expects valid parameters values fails.

To test this setup I setup 2 Pipeline groovy jobs. The content of both jobs is attached.

Parent job

runs on a worker provisioned by the GCE plugin
is triggered with some parameters, including 1 that tells it how many instances of the child job to trigger

Child job

is triggered multiple times by the parent job and passed some parameters
each run of this job requires its own worker
does something reasonably simple
- verifies the worker it is running on is ready by checking for a file (this is just verifying that any necessary build caches like gradle, npm, pip, are on the worker)
- using the parameters passed in by the parent job, it tries to download a file from GCS
- then it sleeps for 15 mins, just to hold the worker
When this job fails, it is because the parameters it should have received are missing and it can't do the download of an artifact from GCS

I have attached screen shots of what the parameters page looks like on a successful run as well as an unsuccessful run.

parent-job-example.groovy.txt
child-job-example.groovy.txt

Support stop GCE instance instead of Termination

As requested here

"Currently the google-compute-engine-plugin creates jenkins node and terminates them if idle. Instead of terminate, it's better to support stop the instance when idle."

Cannot select machine image outside of fixed project list

We have a separate GCE project that runs packer to build our GCE machine images. We can configure the GCE Jenkins plugin to use these via init.groovy.d scripting, but the result is that the UI is actively dangerous to use because clicking "Save" after changing anything on the page will cause the machine image URI to be cleared to the empty string. I would very much like the ability to type a free-form GCE project name or even a full GCE machine image URI; failing that, I would like for the Save button to not cause an outage.

windows GCE instance connects but then report "java.io.EOFException: unexpected stream termination"

[Migrated from: https://issues.jenkins-ci.org/browse/JENKINS-55515]

I can't use windows instances in GCE because of launching agent problem. Please help me to debug/resolve the issue.

Steps to reproduce:

Create windows instance in GCE
Login to the instance by RDP and add user tester with Administrator role
Login to the instance by RDP as tester
Install java8, cygwin with openssh, configure openssh (see how to here: https://docs.oracle.com/cd/E24628_01/install.121/e22624/preinstall_req_cygwin_ssh.htm#EMBSC281)
Check you are able to connect by ssh with tester user and its password
Create private/public rsa keypair (using ssh-keygen), put public key to the /home/tester/.ssh/authorized_keys file
Copy generated private key to your computer to ~/key.txt
Check you are able to connect to the instance with private key without password:
ssh -i ~/key.txt tester@<ip_address>
Stop the instance and create an image from the instance
Goto http://<your_jenkins_address>/credentials/ page and add new credentials "SSH Username with private key", choose "enter directly" for private key and put generated private key here
Add new "Instance configuration" for created image on http://<your_jenkins_address>/configure page: set "Windows?" checkbox, set "Windows Username"=tester, set "Windows SSH Private Key Credentials" to credentials that were created on prev step, set Labels=windows-gce-test, set "Remote Location"=C:\jenkins
Run job with a label "windows-gce-test"
Expected: new GCE instance and jenkins slave are created, the slave is successfully connected and job is successfully ended

Actual: new GCE instance and jenkins slave are created, but slave can't connect

Slave output is the following:
INFO: Connecting to 35.233.217.99 on port 22, with timeout 10000.
Jan 10, 2019 4:42:07 AM null
INFO: Connected via SSH.
Jan 10, 2019 4:42:08 AM null
INFO: Copying slave.jar to: C:
Jan 10, 2019 4:42:11 AM null
INFO: Verifying: java -fullversion
openjdk full version "1.8.0_181-b02"
Jan 10, 2019 4:42:12 AM null
INFO: Launching Jenkins agent via plugin SSH: java -jar C:\slave.jar
Jan 10, 2019 4:42:12 AM null
WARNING: Error: Exception: java.io.EOFException: unexpected stream termination

Jenkins log:
Connected via SSH.
Jan 10, 2019 5:04:02 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Copying slave.jar to: C:
Jan 10, 2019 5:04:03 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Verifying: java -fullversion
Jan 10, 2019 5:04:03 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Launching Jenkins agent via plugin SSH: java -jar C:\slave.jar
Jan 10, 2019 5:04:03 AM WARNING com.google.jenkins.plugins.computeengine.ComputeEngineCloud log
Error:
java.io.EOFException: unexpected stream termination
at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:408)
at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:353)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:415)
at com.google.jenkins.plugins.computeengine.ComputeEngineWindowsLauncher.launch(ComputeEngineWindowsLauncher.java:128)
at com.google.jenkins.plugins.computeengine.ComputeEngineComputerLauncher.launch(ComputeEngineComputerLauncher.java:127)
at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:288)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Jan 10, 2019 5:04:03 AM WARNING com.google.jenkins.plugins.computeengine.ComputeEngineCloud$1 call
Exception waiting for node zulu-win2016-tests-gce-enn0eu to connect
java.io.IOException: Agent failed to connect, even though the launcher didn't report it. See the log output for details.
at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:312)
Caused: java.util.concurrent.ExecutionException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at com.google.jenkins.plugins.computeengine.ComputeEngineCloud$1.call(ComputeEngineCloud.java:171)
at com.google.jenkins.plugins.computeengine.ComputeEngineCloud$1.call(ComputeEngineCloud.java:161)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I'm not sure that GCE uses "Remote Location" for windows slaves (because the log says "INFO: Copying slave.jar to: C:"). Could it be a cause of problem?

I'm unable to create GCE node because of JENKINS-55380 but tried to create permanent aget node and agent starts normally for it, I used the following parameters:
Permanent Agent
Remote root directory=.
Launch method=Launch agent agents via SSH
Credentails=<created_credentials_with_private_key>

Node log:
[01/10/19 05:14:27] [SSH] Checking java version of ./jdk/bin/java
Couldn't figure out the Java version of ./jdk/bin/java
bash: ./jdk/bin/java: No such file or directory

[01/10/19 05:14:27] [SSH] Checking java version of java
[01/10/19 05:14:27] [SSH] java -version returned 1.8.0_181.
[01/10/19 05:14:27] [SSH] Starting sftp client.
[01/10/19 05:14:28] [SSH] Copying latest remoting.jar...
[01/10/19 05:14:30] [SSH] Copied 762,466 bytes.
Expanded the channel window size to 4MB
[01/10/19 05:14:30] [SSH] Starting agent process: cd "." && java -jar remoting.jar -workDir .
Jan 10, 2019 1:14:30 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using .\remoting as a remoting work directory
Both error and output logs will be printed to .\remoting
<===[JENKINS REMOTING CAPACITY]===>channel started
Remoting version: 3.17
This is a Windows agent
NOTE: Relative remote path resolved to: C:\cygwin64\home\tester.
Agent successfully connected and online

GCE plugin shouldn't cleanup nodes from different jenkins masters

[Migrated from: https://issues.jenkins-ci.org/browse/JENKINS-56201]

Upgrade of plugin google-compute-engine-plugin to version 1.0.10 causes that vm instances created in this same GCP project but from different Jenkins Master are removed.

It looks that these changes

google-compute-engine-plugin/src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java

Line 66 in 7ee660e

private void cleanCloud(ComputeEngineCloud cloud) {

trying to cleanup all nodes with this same instanceUniqueId which is resolved only from cloud name. I think it is quite possible that for simplicity cloud name in most cases is identical with project id.

So detection of unique instances should probably ensure that vm instances comes from this same jenkins master before they are deleted.

As workaround different cloud names could be used on each jenkins master.

Migrate to using managed instance groups

A good amount of the instance provisioning, cleanup, and scaling logic can be eliminated from the code base by utilizing GCE's managed instance groups: https://cloud.google.com/compute/docs/instance-groups/

This would also solve some of outstanding issues with efficiency.

Fix remote file location for Windows

Right now, windows launcher launches agents on a location hard-coded into the launcher even though we allow users to specify this. Need to fix this bug.

Reference issue

Code Cleanup: Use builder for InstanceConfiguration

Tech debt note: the constructor for InstanceConfiguration has gotten large enough to warrant use of a builder.

Originally posted by @craigatgoogle in #55

Related: We should move all non-essential fields into @DataboundSetters. This would help to cut down on the constructor size.

SSH key injection happens after instance is created, rather than before

Seems like the SSH key injection via the plugin is happening unnecessarily late.

The SSH public key can be appended to the metadata before instance creation, rather than at launch time.

This way startup scripts can reference the ssh keys in metadata.

No support for adding shutdown script

https://cloud.google.com/compute/docs/shutdownscript documents the process for adding a shutdown script. This will be very useful for us when dealing with pre-empted nodes. The functionality would be the same as startup script. I have it working locally so will put a PR up if this issue gets backing

"Default" subnetwork in config.xml is configured in the wrong format and the rest of my subnets don't even appear in the drop down

After setting up the plugin using the 'default' subnet, I got the error Invalid value for field 'resource.networkInterfaces[0]': '{ \"accessConfig\": [{ \"type\": \"ONE_TO_ONE_NAT\", \"name\": \"External NAT\" }]}'. Subnetwork should be specified for custom subnetmode network"when trying to add a slave. On the Jenkins server, I went to the config.xml and found <networkConfiguration class="com.google.jenkins.plugins.computeengine.AutofilledNetworkConfiguration"> <network>https://www.googleapis.com/compute/v1/projects/zoominfo-2/global/networks/default</network> <subnetwork>default</subnetwork>
I manually updated the subnetwork part to <subnetwork>https://www.googleapis.com/compute/v1/projects/zoominfo-2/regions/us-east1/subnetworks/default</subnetwork>, reloaded the config in the Jenkins UI and everything worked fine. Then I updated the config in the UI and the config reverted back. I also wanted to use a different subnet, but 'default' was the only option.

SSH connection to Jenkins slave is established twice

Is This plugin starts up a new instance in GCE. It connects to the new instances via SSH using an user, which is configured in the plugin settings in Jenkins (for example, the user "jenkins" is used for connecting via SSH). If this is successful, the plugin reconnects as root via SSH again to the instance. As root, the Jenkins slave is started.
See https://github.com/jenkinsci/google-compute-engine-plugin/blob/master/src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineLinuxLauncher.java#L110 and https://github.com/jenkinsci/google-compute-engine-plugin/blob/master/src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineWindowsLauncher.java#L112

Question Why is the Jenkins slave started as root?

Downsides As far as I can see, the Jenkins slave could be started as the first SSH-user, too. This should decrease the time it takes to setup a new Jenkins slave. Furthermore, running the Jenkins slave as root should be avoided for security reasons.

One-shot feature kills slaves of unrelated projects

Greetings,

We've installed v2.0 on several of our jenkins masters, and encountered the following issue:

Our jenkins masters are independant instances, most teams run their own masters. But slaves are launched and run in one google project, maintained by IT. For some masters (teams), new "one-shot" feature is enabled, for others, it is not. And some remaining masters are still running older versions of the plugin without these new one-shot and snapshot features at all.

Apparently, some of the masters with one-shot enabled will start killing slaves of other masters, where one-shot is disabled, probably following some naming pattern which can be similar for different masters. As a result, some teams end up with their slaves killed by other teams' masters while jobs are still run: e.g. "Team A" slave gets killed by "Team B" master, which are two independent teams.

Unable to set periodically deletion of created snapshots

I would like to ask, how can I configure some automated deletion of snapshots created via your Compute Engine plugin in Jenkins. There is no option for that in Jenkins. Also I did not find anything like that in Google Compute Engine what would be applicable to snapshots created by your plugin. Could you please help me with it?
Thank you

Issues adding GCE instance manually

Unable to add GCE node (Manage Jenkins->Manage Nodes->New Node-> type name and select "google compute engine") because of error.

I've seen a couple of users experience this issue although it is not our recommended workflow.

Reference issue

GCE autoscaling behavior not being observed

When there are 2 build agents available, I run 3 jobs. the 2 build agents pick up 2 jobs respectively(which does a sleep for 30s), while the the 3rd job sits in the queue.

The behavior that I expected was that it would spin up a third build agent to pickup the third job.

Here are my configuration settings.

Code Cleanup: Enable test parallelization by removing shared test state

This is similar to jenkinsci/google-kubernetes-engine-plugin#27

Here the main focus will be on ComputeEngineCloudIT.java because they take the most time. I'll also look at addressing #40 so that I don't have to do this change twice.

Terminate on migrate for instances with GPU's attached.

Based off: #63

Should change scheduling in InstanceConfiguration.java to terminate on host maintenance only for instances with GPU's.

See dkozlov@7b7af84 for an example

[Code Cleanup]: Consolidate duplicated launcher logic

There exists a large degree of duplicated logic in both launch methods for Linux and Windows. Ideally this logic should be consolidated. See:

google-compute-engine-plugin/src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineLinuxLauncher.java

Line 91 in a6ec341

 protected void launch(ComputeEngineComputer computer, TaskListener listener, Instance inst) 

google-compute-engine-plugin/src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineWindowsLauncher.java

Line 90 in a6ec341

 protected void launch(ComputeEngineComputer computer, TaskListener listener, Instance inst) 

Parallelize snapshot creation

Parallelize the snapshot creation since it currently blocks.

Reference issue

Using GCE feature 'deploying-containers' fails because of missing Java on host

This plugin supports using GCE instance templates for provisioning a new Jenkins SSH slave.

I created a new instance template in Google compute engine, which uses the GCE feature deploy containers. As the Container-image, I am using openjdk:11-jre-slim.

When using this plugin, a new VM is booted up by my Jenkins job, but Jenkins master fails to connect to it.

From Jenkins log:

May 03, 2019 9:13:31 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log

Connecting to 35.209.XXX.YY on port 22, with timeout 10000.

May 03, 2019 9:13:35 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log

Connected via SSH.

May 03, 2019 9:13:35 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log

connect fresh as root

May 03, 2019 9:13:35 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log

Connecting to 35.209.254.68 on port 22, with timeout 10000.

May 03, 2019 9:13:35 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log

Connected via SSH.

May 03, 2019 9:13:35 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log

Copying agent.jar to: /tmp

May 03, 2019 9:13:35 AM INFO com.google.jenkins.plugins.computeengine.ComputeEngineCloud log

Verifying: java -fullversion

May 03, 2019 9:13:35 AM WARNING com.google.jenkins.plugins.computeengine.ComputeEngineCloud log

Java is not installed.

Well, this failure seems quite obvious to me. There is no Java installed on the machine/host with the (external) IP 35.209.XXX.YY. There would be / is a docker image openjdk:11-jre-slim available on this machine, which could/should be used for executing Java.

Please let this plugin support the feature 'deploying-containers'

PS: This is a follow-up of https://issues.jenkins-ci.org/browse/JENKINS-52251

Remote call on jenkins-workers failed. The channel is closing down or has closed down

Hi there
We are using the google-compute-engine-plugin 3.0.0 and Jenkins 2.164.3 and we are having problems with what seems to be the plugin calling a delete on the VM before the jenkins job completes.
Google Compute Engine plugin Timeout settings for these instances are:
Launch Timeout: 300
Node Retention Time: 6

In our jenkins build logs that are errors like:

20:48:52 Cannot contact jenkins-workers-eciv4a: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on jenkins-workers-eciv4a failed. The channel is closing down or has closed down. this happens in the middle of the job when there is still activity and network connectivity (confirmed via vpc flow logs)

Afer this error the job eventually times out and fails.
However in strackdriver we see that several seconds before the error above there is a delete call on that instance which seems to be coming from the plugin:

20:48:45

{
insertId: "qxcwjoe2k5x4"
logName: "projects/project/logs/cloudaudit.googleapis.com%2Factivity"
operation: {
first: true
id: "operation-1559854124970-58aadd705331f-3facdafe-cec32e65"
producer: "type.googleapis.com"
}
protoPayload: {
@type: "type.googleapis.com/google.cloud.audit.AuditLog"
authenticationInfo: {
principalEmail: "[email protected]"
}
authorizationInfo: [
0: {
granted: true
permission: "compute.instances.delete"
resourceAttributes: {
name: "projects/project/zones/us-central1-b/instances/jenkins-workers-eciv4a"
service: "compute"
type: "compute.instances"
}
}
]
methodName: "v1.compute.instances.delete"
request: {
@type: "type.googleapis.com/compute.instances.delete"
}
requestMetadata: {
callerIp: "10.2.0.94"
callerNetwork: "//compute.googleapis.com/projects/project/global/networks/unknown"
callerSuppliedUserAgent: "jenkins-google-compute-plugin Google-HTTP-Java-Client/1.24.1 (gzip),gzip(gfe)"
destinationAttributes: {
}
requestAttributes: {
auth: {
}
time: "2019-06-06T**20:48:45.**071Z"
}
}
could this be a bug in the plugin or a configuration issue? Is there any way to get extra logging for the plugin that will help determine the cause?

When a compute instance gets preempted, the plugin should abort/reschedule the jobs that were running on the instance

Right now when an instance gets preempted, the jobs that were running on the instance just stall, with the agent in an (offline) status. The builds never progress and never change state, even though the underlying machine is no longer there.

Since the plugin can periodically detect that the VM no exists, it should delete the agent from Jenkins and either abort or reschedule the jobs that were running on that agent.

Virtual machines connected to Jenkins via Compute Engine plugin are terminated periodically within an hour

I use Compute Engine Plugin (v. 3.0.0) for connecting GCE instances to Jenkins CI (v. 2.159). Jenkins automatically creates the instances (e.g. CentOS 6,7, Debian 9 - I tried official images that provides Google Cloud Engine) when some job is stared, but in specific time in every hour (e.g. every XX:57, yesterday it was every XX:53) all these machines are terminated no matter how long does they run. In logs of machines there are just information about the shutdown, anything special:

...
08:46:33 jenkins-gce-cent-7-cv5jlc systemd: Startup finished in 1min 30.753s.
08:47:54 jenkins-gce-cent-7-cv5jlc systemd-logind: Power key pressed.
08:47:54 jenkins-gce-cent-7-cv5jlc systemd-logind: Powering Off...
...

I have no timeout or preemptibility set on the machines.
When I run the same GCE instance manually in Google Cloud console and I connect it to Jenkins via IP address (i use internal IP address and VPN), the problem does not appear.
I tried to change many parameters of Google Compute Engine plugin in Jenkins (connection timeout, One Shot option, Node retention time, etc.) but nothing helped.
In Operations log in Google Compute Engine I can see that the one who initiated the Delete operation was Jenkins account.

Steps to reproduce:
Prepare some template in GCE, use it in Jenkins with Google Compute Engine plugin, start some job and during an hour the machines will be terminated.

I attach log from Jenkins about connected machine and log from /var/log/messages from the virtual machine

messages-20190405.txt
jenkins_slave_log.txt

Cannot connect to GCE slave via SSH

I use the plugin to create a GCE Slave, but the slave is always offline.
I saw the log, it seems cannot connect to the slave via ssh.

just before slave jenkins-slave-916wdt gets launched ...
executing pre-launch scripts ...
Apr 12, 2019 2:31:47 AM null
FINEST: Instance jenkins-slave-916wdt is running and ready...
Apr 12, 2019 2:31:47 AM null
INFO: Launching instance: jenkins-slave-916wdt
Apr 12, 2019 2:31:54 AM null
INFO: bootstrap
Apr 12, 2019 2:31:54 AM null
INFO: Getting keypair...
Apr 12, 2019 2:31:54 AM null
INFO: Using autogenerated keypair
Apr 12, 2019 2:31:54 AM null
INFO: Authenticating as jenkins
Apr 12, 2019 2:31:55 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:31:56 AM null
INFO: Failed to connect via ssh: There was a problem while connecting to 35.229.250.191:22
Apr 12, 2019 2:31:56 AM null
INFO: Waiting for SSH to come up. Sleeping 5.
Apr 12, 2019 2:32:01 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:32:01 AM null
INFO: Failed to connect via ssh: There was a problem while connecting to 35.229.250.191:22
Apr 12, 2019 2:32:01 AM null
INFO: Waiting for SSH to come up. Sleeping 5.
Apr 12, 2019 2:32:06 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:32:07 AM null
INFO: Connected via SSH.
Apr 12, 2019 2:32:07 AM null
INFO: connect fresh as root
Apr 12, 2019 2:32:07 AM null
INFO: Connecting to 35.229.250.191 on port 22, with timeout 10000.
Apr 12, 2019 2:32:07 AM null
INFO: Connected via SSH.
Apr 12, 2019 2:32:07 AM null
INFO: Copying agent.jar to: /tmp
Apr 12, 2019 2:32:09 AM null
INFO: Verifying: java -fullversion
bash: java: command not found
Apr 12, 2019 2:32:09 AM null
WARNING: Java is not installed.
Apr 12, 2019 2:32:09 AM null
INFO: Launching Jenkins agent via plugin SSH: java -jar /tmp/agent.jar
Apr 12, 2019 2:32:09 AM null
WARNING: Error getting exception Exception: java.io.IOException: SSH channel is closed

[Feature] Include help text that clarifies requirements for boot disk

From #69 I discovered that currently users only learn through the documentation that the boot disk image must have Java 8 installed.

Currently, only the bootDiskAutoDelete field has help text. This tracks the work to add help text to the remaining boot disk fields to provide users information on the boot disk requirements directly in the UI.

The channel is closing down or has closed down

Channel "unknown": Remote call on jenkins-worker-rcisma failed. The channel is closing down or has closed down

Occasionally the plugin leaves orphaned, stopped VMs

[Migrated from: https://issues.jenkins-ci.org/browse/JENKINS-52736]

This wastes compute resources and costs.

Ideally the plugin would not do this, but in addition, having a periodic (every 5 minutes) check to go through the current VMs in the project, see which ones are tagged with "jenkins" and then automatically terminate any VMs tagged with that and not known to Jenkins. This would make it resilient against unexpected Jenkins restarts, etc. (though it should be an option in case multiple Jenkins instances share the same GCE project).

Certain integration tests break if order changes

Running testNoSnapshotCreatedSnapshotNull and then testNoSnapshotCreatedInstanceStopping in ComputeEngineCloudNoSnapshotCreatedIT causes failure.

Need to deep dive into this issue since tests should pass regardless of order.

I'll hopefully complete this by 5/3/19 as I will be primary bug duty.

Improve documentation for parameter instance cap per cloud

Referencing this issue

Objective is to improve documentation so that users know that what they name their clouds will determine the instance cap for each cloud. Code logic

[Code Cleanup]: Make ComputeEngineCloud NodeProvisioner compatible

[Migrated from: https://issues.jenkins-ci.org/browse/JENKINS-55412]

When provisioning nodes in ComputeEngineCloud the code currently does a direct out of band call to jenkins.getInstance().addNode(). This is antithetical to the NodeProvisioner workflow which expects to call jenkins.getInstance().addNode() when the PlannedNode's future successfully returns.

Cannot set per-cloud environment variables

Our GCE cloud is configured to not allow direct Internet access, either via external IP or via NAT, and all traffic must leave via a proxy where it can be logged. However, the GCE plugin does not provide any way to set the required http_proxy et al environment variables on a per-cloud basis; the best available workaround would be to set them at the Jenkins global level, but then our non-GCE instances (bare metal, legacy EC2, corp Mac Minis, etc) will have the wrong proxy configured.

Code Cleanup: Replace public fields with getters/setters

Reference: https://issues.jenkins-ci.org/browse/JENKINS-55518

Through the code base there are many examples of public fields in classes. This issue tracks the code cleanup work in re-factoring these to use proper getters/setters.

jenkinsci / google-compute-engine-plugin Goto Github PK

google-compute-engine-plugin's Issues

Environment :

Description of Problems

Recommend Projects

Recommend Topics

Recommend Org