Comments (44)
When i try to change directory to /home/jenkins/./.jenkins-slave with command 'cd /home/jenkins/./.jenkins-slave' it is valid command. The directory is present (it is ~/.jenkins-slave). So it does not seem to be a problem.
from google-compute-engine-plugin.
any update here please?
from google-compute-engine-plugin.
I'll have to take a closer look, but at first glance it might have to do with this feature:
#17
from google-compute-engine-plugin.
Any update here please?
from google-compute-engine-plugin.
Hi @lukasova , what I'm saying with #17, it might be normal that your instances are getting cleaned up after an hour of inactivity. This is intended to save you money in the case you just have instances running that you're not using.
from google-compute-engine-plugin.
no, instances are not inactive. Some job is running on them and they are terminated. That's the problem.
from google-compute-engine-plugin.
O, that is very interesting. Can you show me your instance configuration and more logs if possible.
from google-compute-engine-plugin.
Did you see the following in your logs at all:
hudson.model.AsyncPeriodicWork$1 run
INFO: Started Fingerprint cleanup
from google-compute-engine-plugin.
No, i did not see this INFO.
I have attached some screenshots of used template on Google Cloud Engine and the Jenkins configuration. On the screenshot you can see that we use our own company image for CentOS 7, but the same problem appears on other systems (CentOS 6, Debian 9) and also when I try official CentOS 7 image provided by GCE.
What other logs would you like to see?
from google-compute-engine-plugin.
today's logs (jenkins slave log, jenkins full log and jenkins job failure)
jenkins_system_log.txt
jenkins_slave_log_cent7.txt
jenkins_job_failure.txt
I also archived the instance disk, so if you want some logs from the instance, let me know which ones.
from google-compute-engine-plugin.
I have created an archive with /var/log/ directory of crashed CentOS 7 instance:
from google-compute-engine-plugin.
Ok, so you seem to be SSH'ing in just fine because of the connect fresh as root INFO log.
However, I see there is a relative remote path of /home/jenkins/./.jenkins-slave,
but it seems agent.jar was copied to /tmp/. It's really interesting that's the path we get.
Can you SSH into your instance manually and see if this is a valid path? I think it might not be, and we'll have to look further into that. I feel like I've seen this issue before and it has to do with faulty directories. Just not quite sure how these incorrect paths get generated.
from google-compute-engine-plugin.
Ok, so I just tried out with my remote and with ./ as my remote location i get ./home/jenkins
<===[JENKINS REMOTING CAPACITY]===>Remoting version: 3.17
This is a Unix agent
NOTE: Relative remote path resolved to: /home/jenkins/.
Evacuated stdout
Agent successfully connected and online
from google-compute-engine-plugin.
yes, i forgot to mention it. I've also tried to search why the slave was copied into /tmp directory but I didn't find anything about it. I also tried to find some event which could delete this agent in /tmp but no cron or something like that was started, the agent.jar is still present in /tmp directory.
It is wierd that the instance is always terminated at the same time during an hour. Yesterday it was every XX:47.
And also one question - why the agent name is agent.jar, but when I connect the instance to Jenkins manualy, the jar is called /home/jenkins/remoting.jar ? Is it OK?
Attaching the agent.jar. You may want to check it.
agent.jar.zip
from google-compute-engine-plugin.
Attaching also the System Information about the new CentOS-7 instance I've created few minutes ago.
jenkins_slave_system_information.zip
from google-compute-engine-plugin.
#69 is this issue a similar problem?
But I have Java 8 installed:
[jenkins@jenkins-gce-cent-7-notimer-jb5y6o home]$ java -version
openjdk version "1.8.0_201"
OpenJDK Runtime Environment (build 1.8.0_201-b09)
OpenJDK 64-Bit Server VM (build 25.201-b09, mixed mode)
Sometimes the slave is connected and working almost one hour and then suddenly terminated.
from google-compute-engine-plugin.
#69 is because java 8 was not installed. You don't seem to be having that issue since your logs print a bunch of Java errors.
from google-compute-engine-plugin.
I doubt this is the issue, but worth trying, can you try using the same image as me and see what happens (Debian cloud)
from google-compute-engine-plugin.
yes i can try Cloud Debian. How do you connect these machines to jenkins? Did you generate some ssh keys? Did you create some jenkins account? What other special changes did you make on this machine?
from google-compute-engine-plugin.
ok, so i tried to run Debian official image and it is the same situation. Attaching jenkins logs.
debian_gcloud_official.zip
from google-compute-engine-plugin.
For Linux images, we generate the SSH keys for you. And like I said before, you seem to have no issue SSH'ing. For some reason your agent has trouble running the job.
We're going to put out a new release today and I wonder if that will resolve your issues... I'm not able to reproduce this error and it's not clear at all from the stack trace why this is happening. I will work on this extensively the coming week since I will be on bug duty and can dedicate more bandwidth to issues.
from google-compute-engine-plugin.
Prepare some template in GCE, use it in Jenkins with Google Compute Engine plugin, start some job and during an hour the machines will be terminated.
When you say start some job and during an hour, is the job still running when the instance is terminated or did the job complete and you just kept the instance there and it was deleted?
from google-compute-engine-plugin.
i have already generated some ssh key,it is ok.
from google-compute-engine-plugin.
I start the job, istance is created and the job is running, then at specific time (today every xx:47) the instance is terminated (on google could operations page I can see that the request to terminate comes from jenkins account - stop and delete the instance). Then the machine is not available on Cloud or Jenkins. I can set an option to not delete the disk when the instance is terminated. As I already do (so when I need some logs fromfrom deleted machine, I create the new one manually in google cloud and use the deleted instance's disk and connect to it via ssh).
As you can see in logs I have already attached here, the running job is not completed and it is then terminated, because the agent was deleted.
from google-compute-engine-plugin.
it is wierd that the termination happens whole day at specific minute of an hour. It does not matter if the instance (job) runs 10minutes or 50minutes. If i run the job at 18:40, instance is terminated at 18:47. The same happens when i run the job at 17:50 it also crashes at 18:47.
from google-compute-engine-plugin.
Sorry about the delay.
I was looking at the systems log. Is it possible to get logs before the following line executes?
Apr 26, 2019 1:47:54 PM INFO hudson.remoting.SynchronousCommandTransport$ReaderThread run
The reason I ask is because I want to see if some other plugin or retention strategy is interfering with the agents and terminating them improperly since I notice the following plugin might have something to do with what's happening:
Apr 26, 2019 1:49:50 PM INFO com.nirima.jenkins.plugins.docker.DockerContainerWatchdog$Statistics writeStatisticsToLog
Watchdog Statistics: Number of overall executions: 7204, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms
from google-compute-engine-plugin.
Hello, ok, i am attaching
- full system Jenkins log,
- jenkins log from loggers com.nirima.jenkins.plugins.docker.DockerContainerWatchdog plugin + com.google,
- jenkins log from debian official instance 1,
- jenkins log from debian official instance 2
jenkins_gcloud_log.zip
thanks
from google-compute-engine-plugin.
was it helpful?
from google-compute-engine-plugin.
no update here please?
from google-compute-engine-plugin.
I can't seem to open any of these files, did you just save everything from the website? There are lots of web-related files.
I had wanted to see the logs since I am guessing there could be other plugins interfering with the instances.
Can you isolate the logs?
from google-compute-engine-plugin.
Ok, I've managed to open them.
At 9:53:27 that is where you start getting 404 not found for instances.
At 9:52:57 i see the following:
May 09, 2019 9:49:50 AM INFO hudson.model.AsyncPeriodicWork$1 run
Finished DockerContainerWatchdog Asynchronous Periodic Work. 1 ms
May 09, 2019 9:52:57 AM FINEST com.google.jenkins.plugins.computeengine.CleanLostNodesWork
Starting clean lost nodes worker
May 09, 2019 9:52:57 AM FINEST com.google.jenkins.plugins.computeengine.CleanLostNodesWork
Cleaning cloud Codasip-cloud
However, I'm not seeing any log statements that would indicate we found any instances to terminate. This is possible if no remote instances were found.
However, I'm looking at the method findRemoteInstances (
This may be overkill, but I wonder if you could run Jenkins with your own local build of the plugin and insert more log statements...
from google-compute-engine-plugin.
Alright, at 9:52:58 am, which is not too long after 9:52:57 where we saw Cleaning cloud Codasip-Cloud
:
"zones/europe-west3-c": {
"warning": {
"code": "NO_RESULTS_ON_PAGE",
"message": "There are no results for scope 'zones/europe-west3-c' on this page.",
"data": [
{
"key": "scope",
"value": "zones/europe-west3-c"
}
]
}
},
"zones/europe-west3-a": {
"warning": {
"code": "NO_RESULTS_ON_PAGE",
"message": "There are no results for scope 'zones/europe-west3-a' on this page.",
"data": [
{
"key": "scope",
"value": "zones/europe-west3-a"
}
]
}
},
"zones/europe-west3-b": {
"warning": {
"code": "NO_RESULTS_ON_PAGE",
"message": "There are no results for scope 'zones/europe-west3-b' on this page.",
"data": [
{
"key": "scope",
"value": "zones/europe-west3-b"
}
]
}
},
The timing of this statement makes me suspect it is because of CleanLostNodesWorker.
However, there should be log statements when instances are terminated because of CleanLostNodesWorker...
@ingwarsw Care to contribute any input?
from google-compute-engine-plugin.
@lukasova Are you using latest version of plugin?
There was recently fix for cleaning not own instances..
from google-compute-engine-plugin.
@lukasova You dont have maybe few jenkins configured with same cloud?
from google-compute-engine-plugin.
Logs seem to show only 1 cloud?
from google-compute-engine-plugin.
Not many clouds on one jenkins..
But at least 2 jenkinses with same cloud.. (maybe some test instance)
from google-compute-engine-plugin.
@lukasova Check version at least 3.1.1
from google-compute-engine-plugin.
that's true. We have 2 Jenkinses configured with the same cloud. I never realized it could be related. I will update both plugins to version 3.2.0 and if it does not help I will disable testing version of Jenkins and we'll see. Thank you
from google-compute-engine-plugin.
problem seems to be fixed after updating plugin to version 3.2.0. Hope it won't appear again :) thank you
from google-compute-engine-plugin.
@lukasova thank you for being patient with us! Glad it worked out.
from google-compute-engine-plugin.
@lukasova thank you for being patient with us! Glad it worked out.
Hi rachely3n,
i'm trying to use Google Compute Engine Plugin but getting an error "Could not list in region in project " please look into them. i didn't find out where i am going to wrong.
from google-compute-engine-plugin.
@Mukhtarali212 Usually that issue has to do with your service account credentials. Make sure the credentials you created has the proper permissions.
from google-compute-engine-plugin.
Hi rachely3n ,
Thanks for the reference to resolve that issue, i have one more new issue please see that , there is a error for cloning the git repository in jenkins server when provisioned a new instance from gce plugin VM is launch and job will trigger but getting the error.
![Screenshot from 2020-05-05 15-46-39](https://user-
from google-compute-engine-plugin.
I have upgraded jenkins to 2.426.1 and I am facing similar issue @Mukhtarali212 can you suggest what can be checked here
from google-compute-engine-plugin.
Related Issues (20)
- Agent JVM Options
- Agent/instance is created with Preamble off while Template to use is selected HOT 1
- Feature request: try different zones if the one specified in config does not have enough resources HOT 4
- Feature request: Option to delay agent connection to allow custom startup script to finish HOT 3
- Option the set MAXIMUM triggered builds for an agent
- Enable billing measure of GCP Jenkins workers HOT 2
- The hpi download link is broken HOT 2
- Add new spot for provision type HOT 2
- New Node with "Google Compute Engine" failed HOT 1
- Add an ability to provide image family name instead of image name HOT 1
- Create snapshot defaulting to multi-region
- Expose Prefix Start Agent Command and Suffix Start Agent Command
- Jobs on preempted VMs hang indefinitely until manually cancelled HOT 2
- Option to limit run time of VM HOT 1
- Large number of offline build executors when there is no capacity
- Add option for enabling nested virtualization
- External IP Address can't be configured HOT 10
- Attaching a start up script to an instance template
- Nullsafe operator required
- Change mahine type from configuration not working
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from google-compute-engine-plugin.