Giter VIP home page Giter VIP logo

Comments (44)

lukasova avatar lukasova commented on August 13, 2024 1

When i try to change directory to /home/jenkins/./.jenkins-slave with command 'cd /home/jenkins/./.jenkins-slave' it is valid command. The directory is present (it is ~/.jenkins-slave). So it does not seem to be a problem.

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024 1

any update here please?

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

I'll have to take a closer look, but at first glance it might have to do with this feature:
#17

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

Any update here please?

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

Hi @lukasova , what I'm saying with #17, it might be normal that your instances are getting cleaned up after an hour of inactivity. This is intended to save you money in the case you just have instances running that you're not using.

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

no, instances are not inactive. Some job is running on them and they are terminated. That's the problem.

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

O, that is very interesting. Can you show me your instance configuration and more logs if possible.

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

Did you see the following in your logs at all:

hudson.model.AsyncPeriodicWork$1 run
INFO: Started Fingerprint cleanup

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

No, i did not see this INFO.

I have attached some screenshots of used template on Google Cloud Engine and the Jenkins configuration. On the screenshot you can see that we use our own company image for CentOS 7, but the same problem appears on other systems (CentOS 6, Debian 9) and also when I try official CentOS 7 image provided by GCE.

What other logs would you like to see?

cloud_template1
cloud_template2
gce_plugin_jenkins1
gce_plugin_jenkins2

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

today's logs (jenkins slave log, jenkins full log and jenkins job failure)

jenkins_system_log.txt
jenkins_slave_log_cent7.txt
jenkins_job_failure.txt

I also archived the instance disk, so if you want some logs from the instance, let me know which ones.

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

I have created an archive with /var/log/ directory of crashed CentOS 7 instance:

var_log.zip

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

Ok, so you seem to be SSH'ing in just fine because of the connect fresh as root INFO log.
However, I see there is a relative remote path of /home/jenkins/./.jenkins-slave,
but it seems agent.jar was copied to /tmp/. It's really interesting that's the path we get.

Can you SSH into your instance manually and see if this is a valid path? I think it might not be, and we'll have to look further into that. I feel like I've seen this issue before and it has to do with faulty directories. Just not quite sure how these incorrect paths get generated.

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

Ok, so I just tried out with my remote and with ./ as my remote location i get ./home/jenkins

<===[JENKINS REMOTING CAPACITY]===>Remoting version: 3.17
This is a Unix agent
NOTE: Relative remote path resolved to: /home/jenkins/.
Evacuated stdout
Agent successfully connected and online

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

yes, i forgot to mention it. I've also tried to search why the slave was copied into /tmp directory but I didn't find anything about it. I also tried to find some event which could delete this agent in /tmp but no cron or something like that was started, the agent.jar is still present in /tmp directory.

It is wierd that the instance is always terminated at the same time during an hour. Yesterday it was every XX:47.

And also one question - why the agent name is agent.jar, but when I connect the instance to Jenkins manualy, the jar is called /home/jenkins/remoting.jar ? Is it OK?

Attaching the agent.jar. You may want to check it.
agent.jar.zip

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

Attaching also the System Information about the new CentOS-7 instance I've created few minutes ago.
jenkins_slave_system_information.zip

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

#69 is this issue a similar problem?

But I have Java 8 installed:
[jenkins@jenkins-gce-cent-7-notimer-jb5y6o home]$ java -version
openjdk version "1.8.0_201"
OpenJDK Runtime Environment (build 1.8.0_201-b09)
OpenJDK 64-Bit Server VM (build 25.201-b09, mixed mode)

Sometimes the slave is connected and working almost one hour and then suddenly terminated.

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

#69 is because java 8 was not installed. You don't seem to be having that issue since your logs print a bunch of Java errors.

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

I doubt this is the issue, but worth trying, can you try using the same image as me and see what happens (Debian cloud)

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

yes i can try Cloud Debian. How do you connect these machines to jenkins? Did you generate some ssh keys? Did you create some jenkins account? What other special changes did you make on this machine?

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

ok, so i tried to run Debian official image and it is the same situation. Attaching jenkins logs.
debian_gcloud_official.zip

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

For Linux images, we generate the SSH keys for you. And like I said before, you seem to have no issue SSH'ing. For some reason your agent has trouble running the job.

We're going to put out a new release today and I wonder if that will resolve your issues... I'm not able to reproduce this error and it's not clear at all from the stack trace why this is happening. I will work on this extensively the coming week since I will be on bug duty and can dedicate more bandwidth to issues.

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

Prepare some template in GCE, use it in Jenkins with Google Compute Engine plugin, start some job and during an hour the machines will be terminated.

When you say start some job and during an hour, is the job still running when the instance is terminated or did the job complete and you just kept the instance there and it was deleted?

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

i have already generated some ssh key,it is ok.

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

I start the job, istance is created and the job is running, then at specific time (today every xx:47) the instance is terminated (on google could operations page I can see that the request to terminate comes from jenkins account - stop and delete the instance). Then the machine is not available on Cloud or Jenkins. I can set an option to not delete the disk when the instance is terminated. As I already do (so when I need some logs fromfrom deleted machine, I create the new one manually in google cloud and use the deleted instance's disk and connect to it via ssh).

As you can see in logs I have already attached here, the running job is not completed and it is then terminated, because the agent was deleted.

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

it is wierd that the termination happens whole day at specific minute of an hour. It does not matter if the instance (job) runs 10minutes or 50minutes. If i run the job at 18:40, instance is terminated at 18:47. The same happens when i run the job at 17:50 it also crashes at 18:47.

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

Sorry about the delay.
I was looking at the systems log. Is it possible to get logs before the following line executes?
Apr 26, 2019 1:47:54 PM INFO hudson.remoting.SynchronousCommandTransport$ReaderThread run

The reason I ask is because I want to see if some other plugin or retention strategy is interfering with the agents and terminating them improperly since I notice the following plugin might have something to do with what's happening:

Apr 26, 2019 1:49:50 PM INFO com.nirima.jenkins.plugins.docker.DockerContainerWatchdog$Statistics writeStatisticsToLog

Watchdog Statistics: Number of overall executions: 7204, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

Hello, ok, i am attaching

  • full system Jenkins log,
  • jenkins log from loggers com.nirima.jenkins.plugins.docker.DockerContainerWatchdog plugin + com.google,
  • jenkins log from debian official instance 1,
  • jenkins log from debian official instance 2
    jenkins_gcloud_log.zip

thanks

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

was it helpful?

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

no update here please?

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

I can't seem to open any of these files, did you just save everything from the website? There are lots of web-related files.

I had wanted to see the logs since I am guessing there could be other plugins interfering with the instances.
Can you isolate the logs?

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

Ok, I've managed to open them.
At 9:53:27 that is where you start getting 404 not found for instances.

At 9:52:57 i see the following:

May 09, 2019 9:49:50 AM INFO hudson.model.AsyncPeriodicWork$1 run
Finished DockerContainerWatchdog Asynchronous Periodic Work. 1 ms
May 09, 2019 9:52:57 AM FINEST com.google.jenkins.plugins.computeengine.CleanLostNodesWork
Starting clean lost nodes worker
May 09, 2019 9:52:57 AM FINEST com.google.jenkins.plugins.computeengine.CleanLostNodesWork
Cleaning cloud Codasip-cloud

However, I'm not seeing any log statements that would indicate we found any instances to terminate. This is possible if no remote instances were found.
However, I'm looking at the method findRemoteInstances (

private List<Instance> findRemoteInstances(ComputeEngineCloud cloud) {
), and we should be finding remote instances.

This may be overkill, but I wonder if you could run Jenkins with your own local build of the plugin and insert more log statements...

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

Alright, at 9:52:58 am, which is not too long after 9:52:57 where we saw Cleaning cloud Codasip-Cloud:

  "zones/europe-west3-c": {
   "warning": {
    "code": "NO_RESULTS_ON_PAGE",
    "message": "There are no results for scope 'zones/europe-west3-c' on this page.",
    "data": [
     {
      "key": "scope",
      "value": "zones/europe-west3-c"
     }
    ]
   }
  },
  "zones/europe-west3-a": {
   "warning": {
    "code": "NO_RESULTS_ON_PAGE",
    "message": "There are no results for scope 'zones/europe-west3-a' on this page.",
    "data": [
     {
      "key": "scope",
      "value": "zones/europe-west3-a"
     }
    ]
   }
  },
  "zones/europe-west3-b": {
   "warning": {
    "code": "NO_RESULTS_ON_PAGE",
    "message": "There are no results for scope 'zones/europe-west3-b' on this page.",
    "data": [
     {
      "key": "scope",
      "value": "zones/europe-west3-b"
     }
    ]
   }
  },

The timing of this statement makes me suspect it is because of CleanLostNodesWorker.
However, there should be log statements when instances are terminated because of CleanLostNodesWorker...

@ingwarsw Care to contribute any input?

from google-compute-engine-plugin.

ingwarsw avatar ingwarsw commented on August 13, 2024

@lukasova Are you using latest version of plugin?
There was recently fix for cleaning not own instances..

from google-compute-engine-plugin.

ingwarsw avatar ingwarsw commented on August 13, 2024

@lukasova You dont have maybe few jenkins configured with same cloud?

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

Logs seem to show only 1 cloud?

from google-compute-engine-plugin.

ingwarsw avatar ingwarsw commented on August 13, 2024

Not many clouds on one jenkins..
But at least 2 jenkinses with same cloud.. (maybe some test instance)

from google-compute-engine-plugin.

ingwarsw avatar ingwarsw commented on August 13, 2024

@lukasova Check version at least 3.1.1

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

that's true. We have 2 Jenkinses configured with the same cloud. I never realized it could be related. I will update both plugins to version 3.2.0 and if it does not help I will disable testing version of Jenkins and we'll see. Thank you

from google-compute-engine-plugin.

lukasova avatar lukasova commented on August 13, 2024

problem seems to be fixed after updating plugin to version 3.2.0. Hope it won't appear again :) thank you

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

@lukasova thank you for being patient with us! Glad it worked out.

from google-compute-engine-plugin.

Mukhtarali212 avatar Mukhtarali212 commented on August 13, 2024

@lukasova thank you for being patient with us! Glad it worked out.

Hi rachely3n,

i'm trying to use Google Compute Engine Plugin but getting an error "Could not list in region in project " please look into them. i didn't find out where i am going to wrong.
Screenshot from 2020-04-01 11-38-55

from google-compute-engine-plugin.

rachely3n avatar rachely3n commented on August 13, 2024

@Mukhtarali212 Usually that issue has to do with your service account credentials. Make sure the credentials you created has the proper permissions.

For reference: https://cloud.google.com/solutions/using-jenkins-for-distributed-builds-on-compute-engine#configure_cloud_identity_and_access_management

from google-compute-engine-plugin.

Mukhtarali212 avatar Mukhtarali212 commented on August 13, 2024

Hi rachely3n ,

Thanks for the reference to resolve that issue, i have one more new issue please see that , there is a error for cloning the git repository in jenkins server when provisioned a new instance from gce plugin VM is launch and job will trigger but getting the error.
![Screenshot from 2020-05-05 15-46-39](https://user-
Screenshot from 2020-05-04 12-25-02

from google-compute-engine-plugin.

sindhu-chilukuri avatar sindhu-chilukuri commented on August 13, 2024

I have upgraded jenkins to 2.426.1 and I am facing similar issue @Mukhtarali212 can you suggest what can be checked here

from google-compute-engine-plugin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.