Giter VIP home page Giter VIP logo

cryostat-agent's People

Contributors

aali309 avatar andrewazores avatar dependabot[bot] avatar ebaron avatar josh-matsuoka avatar maxcao13 avatar mwangggg avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

cryostat-agent's Issues

[Story] Agent dual-registration

Currently, the Agent implements a Cryostat Discovery Plugin by creating a plugin registration (and corresponding REALM node), and defines a single target JVM node within that Realm, representing itself. This will normally have an Agent HTTP connectUrl, unless the config property for prefer-jmx is set, in which case the Agent will publish a JMX ServiceURL for itself if the JVM it's attached to appears to have JMX enabled.

Why not publish both URLs? The Agent HTTP URL will always be available and publish-able as one target JVM node. After #163 this URL may also support write/mutation operations. A simple check endpoint can be added to return that status so that the available features can be queried without first trying to perform a write operation and seeing if it fails. A second target JVM node under the same Realm could be optionally published as well, if the host JVM appears to have JMX enabled. Then the prefer-jmx config can be dropped.

[Bug] Interplay of Cryostat discovery ping period and harvester period results in reset intervals and timing skew

          Ah yes, I think that makes sense with the way the agent's internal state machinery works. The discovery ping request tells the agent to re-register itself, which really means de-register and start the registration flow over. But, the harvester also intentionally stops itself when the agent has become deregistered, and starts on successful registration. So if the discovery ping period is shorter than the harvester period you'll end up with the agent probably never even attempting to push harvested files, or if it does then perhaps it gets interrupted part way through sometimes.

There's probably some approach I could take like adding an internal state for "re-registering" to allow this to smoothly transition over, but I'm not sure how worthwhile it is.

This does point out an interesting side-effect of the discovery ping though, because it means that the harvester period will result in files being pushed with that periodicity on one interval, but the interval will reset at the next re-registration time. If the discovery ping period is relatively long and the harvester period is relatively short then this won't be very noticeable, but as these two periods become closer in value then the skew could become noticeable, up to the point you've identified where the skew actually can prevent any harvesting from happening at all.

I think that's a separate issue to work on, maybe for next development cycle.

Originally posted by @andrewazores in #86 (comment)

[Task] CI fails to download `cryostat-core` dependency

For example:

#125
(https://github.com/cryostatio/cryostat-agent/actions/runs/5070291951/jobs/9105123984?pr=125)
image

The project's pom.xml is configured to search the GitHub Maven Packages repository, but authentication is failing.

<repositories>

The CI workflow does have a step that is supposed to set up the Maven settings.xml that provides the credentials however:

servers: '[{"id": "github", "username": "dummy", "password": "${env.GITHUB_TOKEN_REF}"}]'

And the same or very similar configuration is used in the main Cryostat repository, which works and is able to download the cryostat-core dependency:

https://github.com/cryostatio/cryostat/blob/2b0cd5adacf8f928f8666f36663b0a5b31e0e76b/.github/workflows/ci-jobs.yml#L53

[Task] Implement scheduled periodic recording uploads

The Agent should expose configuration properties for periodic pushes of captured JFR data to the Cryostat backend. After the Agent has successfully registered with Cryostat and started a Flight Recording locally, it should periodically push the current JFR buffer contents to Cryostat as a full .jfr file. In the future there can also be work done to stream the data, perhaps on a per-event basis or at least on a per-chunk basis, but that is out of scope for this initial implementation.

[Bug] Discovery plugin ping reregistration triggers recording cancellation

2022-12-23 20:47:55,799 INFO  [io.cry.age.Registration] (cryostat-agent-worker-0) Registration retry period: 5000(ms)
2022-12-23 20:48:01,551 INFO  [io.cry.age.Registration] (cryostat-agent-worker-1) Registered as f2c4a745-db78-495a-acbe-327d8b8a87bf
2022-12-23 20:48:01,555 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-1) JFR Harvester starting
2022-12-23 20:48:01,926 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-1) JFR Harvester started
2022-12-23 20:48:01,938 INFO  [io.cry.age.Registration] (cryostat-agent-worker-1) publishing self as http://localhost:9977/
2022-12-23 20:48:02,148 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-2) cryostat-agent(1) RUNNING
2022-12-23 20:48:02,270 INFO  [io.cry.age.Registration] (cryostat-agent-worker-2) Publish success
2022-12-23 20:52:59,255 INFO  [io.cry.age.Registration] (cryostat-agent-worker-1) Registered as f2c4a745-db78-495a-acbe-327d8b8a87bf
2022-12-23 20:52:59,256 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-1) JFR Harvester starting
2022-12-23 20:52:59,257 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-1) JFR Harvester started
2022-12-23 20:52:59,264 INFO  [io.cry.age.Registration] (cryostat-agent-worker-1) publishing self as http://localhost:9977/
2022-12-23 20:52:59,288 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-2) cryostat-agent(1) STOPPED
2022-12-23 20:52:59,289 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-2) cryostat-agent(1) CLOSED
2022-12-23 20:52:59,302 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-2) Uploading quarkus-test-agent_default_20221223T205259Z.jfr
2022-12-23 20:57:57,387 INFO  [io.cry.age.Agent] (SIGTERM handler) Caught SIGTERM(15)

I triggered this by setting up the main Cryostat smoketest.sh with the default discovery ping period and an Agent harvester period of 300000(ms - 5 minutes). The Agent never ended up pushing any harvested JFR files to Cryostat. As the logs reveal, the re-registration occurred, which restarted the harvester. This ended up cancelling the running recording without restarting it.

The harvester should not be restarted when the Agent re-registers as a discovery plugin with Cryostat - if the harvester is already running and already has periodic tasks set up to push harvested files, those should continue as usual with no interruption to the push schedule.

[Task] Add configuration for scheduled recording `maxage`/`maxsize`

Currently, only recordings pushed on application exit have the maxage/maxsize properties set. Recordings pushed on a periodic basis during normal application runtime do not, so the file size can grow large and is likely to contain duplicate data from the last periodic push. This wastes network bandwidth and archival disk storage space.

[Epic] Collect JFR data locally and push to Cryostat

The Agent should provide the capability to start Flight Recordings on the attached JVM, similar to what the flags conventionally do:

-XX:+FlightRecorder -XX:StartFlightRecording=duration=60s,filename=myrecording.jfr

ie there should be configuration properties that the Agent picks up and uses to start a recording, with a given event template (".jfc profile") name, maxAge/maxSize settings. Rather than providing a local filesystem path where the recording should be dumped, the Agent should push the recording to Cryostast over HTTP.

The Agent should have a configuration to push the latest data to Cryostat periodically, similar to what Cryostat's Automated Rules do but with an HTTP push instead of a JMX pull data flow.

The Agent should also be able to intercept SIGTERM (and other signals?) and send the latest recording data to Cryostat before propagating the signal and allowing the JVM to continue clean shutdown.

[Bug] `POST` to Cryostat failure

Logs from the Cryostat backend:

Caused by: io.cryostat.net.web.http.api.v2.ApiException: No recording submission
	at io.cryostat.net.web.http.api.beta.RecordingsFromIdPostHandler.handleAuthenticated(RecordingsFromIdPostHandler.java:201)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:102)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:72)
	at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1284)
	at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:173)
	at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:140)
	at io.vertx.ext.web.handler.impl.BodyHandlerImpl.handle(BodyHandlerImpl.java:93)
	at io.vertx.ext.web.handler.impl.BodyHandlerImpl.handle(BodyHandlerImpl.java:46)
	at io.cryostat.net.web.http.api.beta.RecordingsFromIdPostBodyHandler.handleAuthenticated(RecordingsFromIdPostBodyHandler.java:117)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:102)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:72)
	at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1284)
	at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:173)
	at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:140)
	at io.vertx.ext.web.handler.impl.CorsHandlerImpl.handle(CorsHandlerImpl.java:189)
	at io.vertx.ext.web.handler.impl.CorsHandlerImpl.handle(CorsHandlerImpl.java:41)
	at io.cryostat.net.web.http.generic.CorsEnablingHandler.handle(CorsEnablingHandler.java:131)
	at io.cryostat.net.web.http.generic.CorsEnablingHandler.handle(CorsEnablingHandler.java:63)
	at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1284)
	at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:173)
	at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:140)
	at io.vertx.ext.web.handler.impl.LoggerHandlerImpl.handle(LoggerHandlerImpl.java:189)
	at io.vertx.ext.web.handler.impl.LoggerHandlerImpl.handle(LoggerHandlerImpl.java:48)
	at io.cryostat.net.web.http.generic.RequestLoggingHandler.handle(RequestLoggingHandler.java:123)
	at io.cryostat.net.web.http.generic.RequestLoggingHandler.handle(RequestLoggingHandler.java:65)
	at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1284)
	at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:173)
	at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:140)
	at io.vertx.ext.web.impl.RouterImpl.handle(RouterImpl.java:68)
	at io.vertx.ext.web.impl.RouterImpl.handle(RouterImpl.java:37)
	at io.cryostat.net.HttpServer$HandlerDelegate.handle(HttpServer.java:168)
	at io.vertx.core.impl.EventLoopContext.emit(EventLoopContext.java:55)
	at io.vertx.core.impl.DuplicatedContext.emit(DuplicatedContext.java:158)
	at io.vertx.core.http.impl.Http2ServerRequest.dispatch(Http2ServerRequest.java:122)
	at io.vertx.core.http.impl.Http2ServerStream.onHeaders(Http2ServerStream.java:96)
	at io.vertx.core.http.impl.Http2ServerConnection.onHeadersRead(Http2ServerConnection.java:155)
	at io.vertx.core.http.impl.Http2ConnectionBase.onHeadersRead(Http2ConnectionBase.java:202)
	at io.vertx.core.http.impl.Http2ServerConnection.onHeadersRead(Http2ServerConnection.java:44)
	at io.netty.handler.codec.http2.Http2FrameListenerDecorator.onHeadersRead(Http2FrameListenerDecorator.java:48)
	at io.netty.handler.codec.http2.Http2EmptyDataFrameListener.onHeadersRead(Http2EmptyDataFrameListener.java:63)
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:409)
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:337)
	at io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onHeadersRead(Http2InboundFrameLogger.java:56)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader$2.processFragment(DefaultHttp2FrameReader.java:476)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:484)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:253)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159)
	at io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173)
	at io.netty.handler.codec.http2.DecoratingHttp2ConnectionDecoder.decodeFrame(DecoratingHttp2ConnectionDecoder.java:63)
	at io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:393)
	at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:453)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:519)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:458)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:280)
	at io.vertx.core.http.impl.VertxHttp2ConnectionHandler.channelRead(VertxHttp2ConnectionHandler.java:408)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)
Feb 06, 2023 6:59:43 PM org.slf4j.impl.JDK14LoggerAdapter fillCallerData
WARNING: 127.0.0.1 - - [Mon, 6 Feb 2023 18:59:43 GMT] 2ms "POST /api/beta/recordings/AAl-2I7mLgO4H2DtGz1gApO8nIVGgvrGmA2Q6AC482A= HTTP/2.0" 400 97 bytes "-" "Java-http-client/17.0.5"

The -agent logs just show a generic 400 error response when the harvester attempts to push a file to storage.

[Story] Separate TLS keystore/truststore from host application

The Agent contains an HTTP webserver that exposes its API for Cryostat. This webserver should support HTTPS, and it should be possible to configure the Agent to use a TLS keystore and certificate for securing this HTTPS server. It should be possible for this TLS keystore to be separate from any the attached target application might already have.

The Agent also contains an HTTP client that it uses for communicating with Cryostat, to register itself as a discovery plugin, publsh information about itself, etc. This HTTP client should support HTTPS that the Cryostat server may expose. Likewise, it should be possible to configure the Agent to add the Cryostat server's TLS certificate to the Agent's truststore, and it should be possible to configure it so that this Agent truststore is separate from the truststore of the attached target application.

Related: cryostatio/cryostat-operator#595

[Task] Remove Vert.x dependency

To slim down the built JAR, simplify the implementation, and have fewer rebuilds due to dependency updates, the Vert.x dependency should be removed. The JDK has a workable HTTP client built in that can be used instead of vertx-web's client, and the vert.x EventBus usage can be replaced by a more simple callback pattern or simple internal message queue. The vert.x HTTP server can be replaced by either the non-JDK-API com.sun.net.httpserver.HttpServer or some other embedded HTTP server implementation.

[Bug] `max-files` config results in HTTP 400 from server

Jan 30, 2023 4:19:47 PM io.cryostat.core.log.Logger warn
WARNING: HTTP 400: maxFiles must be a positive integer
io.vertx.ext.web.handler.HttpException: Bad Request
Caused by: io.cryostat.net.web.http.api.v2.ApiException: maxFiles must be a positive integer
	at io.cryostat.net.web.http.api.beta.RecordingsFromIdPostHandler.handleAuthenticated(RecordingsFromIdPostHandler.java:190)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:102)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:72)
	at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1284)
	at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:173)
	at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:140)
	at io.vertx.ext.web.handler.impl.BodyHandlerImpl$BHandler.doEnd(BodyHandlerImpl.java:355)
	at io.vertx.ext.web.handler.impl.BodyHandlerImpl$BHandler.uploadEnded(BodyHandlerImpl.java:321)
	at io.vertx.ext.web.handler.impl.BodyHandlerImpl$BHandler.lambda$null$0(BodyHandlerImpl.java:250)
	at io.vertx.core.impl.future.FutureImpl$3.onSuccess(FutureImpl.java:141)
	at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60)
	at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211)
	at io.vertx.core.impl.future.Composition$1.onSuccess(Composition.java:62)
	at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60)
	at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211)
	at io.vertx.core.impl.future.Composition$1.onSuccess(Composition.java:62)
	at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60)
	at io.vertx.core.impl.future.SucceededFuture.addListener(SucceededFuture.java:88)
	at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:43)
	at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60)
	at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211)
	at io.vertx.core.impl.future.PromiseImpl.tryComplete(PromiseImpl.java:23)
	at io.vertx.core.impl.future.PromiseImpl.onSuccess(PromiseImpl.java:49)
	at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)
Jan 30, 2023 4:19:47 PM org.slf4j.impl.JDK14LoggerAdapter fillCallerData
WARNING: 127.0.0.1 - - [Mon, 30 Jan 2023 16:19:47 GMT] 36ms "POST /api/beta/recordings/92biA-rcHLQhcbkOrujsPQY_rQY7ZBCZlQZS76TBt28= HTTP/1.1" 400 139 bytes "-" "Java-http-client/17.0.5"

[Task] Target JDK11 (or JDK8?)

For better compatibility with existing applications in production, the agent should target and be built with JDK11 or even JDK8, rather than the current JDK17.

[Story] Agent credentials storage conflicts with authenticated JMX

The Agent generates its HTTP webserver credentials and then stores them in the Cryostat server's encrypted keyring along with the other stored credentials. Cryostat consults these stored credentials and their matchExpressions to determine what credentials to pass when initiating a JMX connection. If the Agent is installed into a target application that is configured with JMX authentication, then the Agent will store credentials that also match the target's JMX definition, since the matchExpression used by the agent is target.jvmId == 'abcd1234'. This conflict results in Cryostat failing to establish a JMX connection to the target since the Agent's webserver credentials are incorrectly passed.

[Bug] Agent blocks graceful shutdown if it didn't start cleanly

Description:

smoketest.sh: add this sample app config

    podman run \
        --name quarkus-test-agent-0 \
        --pod cryostat-pod \
        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/deployments/app/cryostat-agent.jar" \
        --env QUARKUS_HTTP_PORT=10009 \
        --env ORG_ACME_CRYOSTATSERVICE_ENABLED="false" \
        --rm -d quay.io/andrewazores/quarkus-test:latest

Run sh smoketest.sh, let everything spin up.

podman logs quarkus-test-agent-0:

Starting the Java application using /opt/jboss/container/java/run/run-java.sh ...
INFO exec  java -Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/deployments/app/cryostat-agent.jar -XX:+UseParallelGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:+ExitOnOutOfMemoryError -cp "." -jar /deployments/quarkus-run.jar 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/deployments/app/cryostat-agent.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/deployments/lib/main/org.jboss.slf4j.slf4j-jboss-logmanager-1.1.0.Final.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Slf4jLoggerFactory]
__  ____  __  _____   ___  __ ____  ______ 
 --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ 
 -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \   
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/   
2022-12-23 20:38:58,013 INFO  [io.cry.age.Agent] (cryostat-agent) Cryostat Agent starting...
2022-12-23 20:39:00,678 INFO  [io.cry.cor.net.JFRConnectionToolkit] (cryostat-agent) Computed self JVM ID: 7Rprt-Rwnu9Pc4qkKe7G1nDZw6_s6fT2Ry--P8vNZXw=
2022-12-23 20:39:00,746 SEVERE [io.cry.age.Agent] (cryostat-agent) Agent startup failure: java.util.NoSuchElementException: SRCFG00040: The config property cryostat.agent.baseuri is defined as the empty String ("") which the following Converter considered to be null: io.smallrye.config.ImplicitConverters$ConstructorConverter
	at io.smallrye.config.SmallRyeConfig.convertValue(SmallRyeConfig.java:284)
	at io.smallrye.config.SmallRyeConfig.getValue(SmallRyeConfig.java:239)
	at io.smallrye.config.SmallRyeConfig.getValue(SmallRyeConfig.java:167)
	at io.cryostat.agent.ConfigModule.provideCryostatAgentBaseUri(ConfigModule.java:91)
	at io.cryostat.agent.ConfigModule_ProvideCryostatAgentBaseUriFactory.provideCryostatAgentBaseUri(ConfigModule_ProvideCryostatAgentBaseUriFactory.java:36)
	at io.cryostat.agent.ConfigModule_ProvideCryostatAgentBaseUriFactory.get(ConfigModule_ProvideCryostatAgentBaseUriFactory.java:27)
	at io.cryostat.agent.ConfigModule_ProvideCryostatAgentBaseUriFactory.get(ConfigModule_ProvideCryostatAgentBaseUriFactory.java:10)
	at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
	at io.cryostat.agent.MainModule_ProvideCryostatClientFactory.get(MainModule_ProvideCryostatClientFactory.java:48)
	at io.cryostat.agent.MainModule_ProvideCryostatClientFactory.get(MainModule_ProvideCryostatClientFactory.java:10)
	at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
	at io.cryostat.agent.MainModule_ProvideRegistrationFactory.get(MainModule_ProvideRegistrationFactory.java:55)
	at io.cryostat.agent.MainModule_ProvideRegistrationFactory.get(MainModule_ProvideRegistrationFactory.java:10)
	at dagger.internal.DoubleCheck.get(DoubleCheck.java:47)
	at io.cryostat.agent.DaggerAgent_Client.registration(DaggerAgent_Client.java:117)
	at io.cryostat.agent.Agent.main(Agent.java:99)
	at io.cryostat.agent.Agent.lambda$agentmain$5(Agent.java:122)
	at java.base/java.lang.Thread.run(Thread.java:833)

2022-12-23 20:39:03,896 INFO  [io.quarkus] (main) quarkus-test 1.0.0-SNAPSHOT on JVM (powered by Quarkus 2.7.2.Final) started in 5.819s. Listening on: http://0.0.0.0:10009
2022-12-23 20:39:03,897 INFO  [io.quarkus] (main) Profile prod activated. 
2022-12-23 20:39:03,897 INFO  [io.quarkus] (main) Installed features: [cdi, rest-client, rest-client-jackson, resteasy, smallrye-context-propagation, vertx]

Ctrl-c to tear down smoketest.sh:

...
^C+ cleanup
+ podman pod stop cryostat-pod
WARN[0010] StopSignal SIGTERM failed to stop container quarkus-test-agent-0 in 10 seconds, resorting to SIGKILL

Expected:

If any required configuration property is not set, the Agent should gracefully handle this and fail to start. SIGTERMing the JVM should result in a normal clean shutdown.

[Task] Push recordings to per-agent API endpoint

When the Agent pushes JFR file data to Cryostat, it should do so at an API endpoint that corresponds to the specific Agent instance. ie some unique identifier relating to the Agent, which both the Agent and Crysotat agree upon, should be included as an endpoint path parameter or otherwise in the request metadata, so that Cryostat can collect the pushed files and link them to the Agent instance origin for later queries.

Depends on cryostatio/cryostat#1299

[Bug] SEVERE: recordings serialization failure java.lang.NullPointerException: Cannot invoke "java.time.Duration.toMillis()"

May 05, 2023 8:23:29 PM io.cryostat.agent.remote.RecordingsContext handle
SEVERE: recordings serialization failure
java.lang.NullPointerException: Cannot invoke "java.time.Duration.toMillis()" because the return value of "jdk.jfr.Recording.getMaxAge()" is null
	at io.cryostat.agent.remote.RecordingsContext$RecordingInfo.<init>(RecordingsContext.java:126)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
	at io.cryostat.agent.remote.RecordingsContext.getRecordings(RecordingsContext.java:99)
	at io.cryostat.agent.remote.RecordingsContext.handle(RecordingsContext.java:78)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:95)
	at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:71)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:98)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:851)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:95)
	at io.cryostat.agent.WebServer$CompressionFilter.doFilter(WebServer.java:267)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:98)
	at io.cryostat.agent.WebServer$RequestLoggingFilter.doFilter(WebServer.java:212)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:98)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:818)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

[Bug] Discovery ping causes deregistration and bad side effects

  • #90 : timing skew can become total failure to upload periodic recordings when the discovery ping period is shorter than the upload period
  • discovery ping signal causes Agent to deregister and re-register. The deregistration is handled internally as a full deregistration as if the Agent is shutting down. This cancels the periodic archive timer (which starts anew on re-registration) and also seems to prevent onexit uploads

This can probably be resolved by adding additional internal state to the Agent. Currently the lifecycle can go from REGISTERED (and PUBLISHED) to DEREGISTERED, and back again. Adding state transition chains like REGISTERED -> REFRESHING -> REGISTERED (on success) and REGISTERED - > REFRESHING -> DEREGISTERED (on failure) should allow more nuanced handling of the discovery ping signal so that these bad effects can be avoided until distinctly necessary.

[Task] Intercept `SIGTERM` and upload recording before clean shutdown

Implementation to support cryostatio/cryostat#1013

The Agent should be able to handle SIGTERM and perhaps other related signals which may be sent by the host OS or container platforms when the JVM should exit. There will normally be some grace period before the platform SIGKILLs the JVM to allow for graceful cleanup. The Agent should attempt to send an exit recording file dump to Cryostat within this window. The time required to send the dump is variable (size of recording, speed of link between Agent and Cryostat) so this will be a best-effort attempt, but there may be techniques to try to get it done as quickly as possible. For example, perhaps there should be a configuration to allow attempting to zip these files before sending them, in case the deploying user knows the network link may be slow but the JVM may have the CPU to spare. The Agent may also expose configuration knobs to the user allowing them to configure the specific maxAge/maxSize for the data within these dump files to try to get the upload time within the window. The graceful shutdown period is also likely to be a configurable knob on the platform that the user can experiment with.

[Task] Set up CI and automations

Like other Cryostat projects, we should have CI and various automations around PRs to help ensure code quality. The CI can also produce the agent fat-JAR as an artefact and publish that somewhere for consumption.

[Bug] Agent registration flow after Cryostat restarts

The Agent's startup registration flow is as follows:

  1. Start up, check all required environment variables. Fail if any not defined.
  2. Generate authorization credentials for own webserver
  3. Create a matchExpression tailored to itself (target.jvmId == '$myHashId'), publish a stored credential to Cryostat using this expression and the generated credentials. Retry periodically until this succeeds.
  4. Register as a discovery plugin. Retry periodically until this succeeds.
  5. Publish a description of itself as the single node under its plugin registration.

This is mostly resilient on the Cryostat side against various conditions where communications between the Agent and Cryostat break down, or Cryostat goes down temporarily, because there is the callback URI ping mechanism, where Cryostat will attempt to call the plugin back to 1. ensure that it is still there, 2. ensure the plugin (agent) refreshes its registration token before expiry, 3. Cryostat's stored credentials for the plugin are still valid.

However, this can be problematic on the agent side in the case that Cryostat goes down temporarily and the agent stays up. In this case, Cryostat will call back the plugin with the old callback and old stored credentials, which will fail, prompting Cryostat to deregister the plugin. Cryostat will then forget that this plugin ever existed and not try to ping its callback anymore. Meanwhile, the Agent considers itself registered, published, and up-to-date, and will happily sit idle forever, never renewing its registration with Cryostat. If the user stops the agent (its attached application) and restarts it then the flow will properly resume from the top and everything should be properly registered and published again, however, a properly resilient system would reach this state again when either Cryostat or the agent go down temporarily and reappear later.

Moreover, if the Agent does notice that it is no longer registered with Cryostat and attempts to re-register itself, it will fail at Step 3, because it will regenerate an identical matchExpression, which Cryostat will reject because it expects matchExpressions to be unique. Cryostat should communicate this specific failure mode (ex. via response status code) in a way that the agent recognizes so that it can continue to step 4. Other failure modes should still block progression and probably lead to retrying step 3.

[Story] Ability to configure Agent to listen to MBean metrics and dynamically start recording

As a user, I would like a way to configure the Agent to listen to selected MBean metrics. When some condition is met regarding that metric, the Agent should dynamically start a JFR recording. The event template used for the recording should also be a related configuration option. Ideally, the trigger can have multiple conditions, ex. CPU usage above X% for Y duration, AND physical memory usage above A% for B duration. Multiple configurations can be stacked together, which would logically OR the conditions. The configurations must be provided to the Agent by some static means ex. environment variables, not by dynamic methods such as accepting a configuration file over the Agent HTTP API. This probably means continuing to use the SmallRye Config properties, which allow env var, system property, or property file specification. A syntax for the conditions will need to be devised that can fit into the properties format.

It should also be possible to configure both the existing periodic upload as well as smart triggers. For example, as a user, I might want to have an always-on recording with the Continuous template that uploads every 30 minutes, as well as smart triggers that start a Profiling recording when some conditions are met and uploads every 5 minutes, and both of these scenarios should run concurrently.

[Epic] Two-way communications protocol

The agent currently exposes only a readonly HTTP API that the Cryostat server can use to query basic information such as a list of active recording descriptors, the JFR event types and templates available, and MBean metrics data.

The agent should also implement mutation requests for actions such as dynamically starting Flight Recordings, including with custom event templates supplied by the server with the request.

The readonly API should remain implemented and available. We should explore whether the mutable (write/update) API endpoints are always available, or if they are gated behind an additional opt-in property that requires the user to explicitly enable them.

The goal after this project is completed is that the Cryostat backend's client should be able to make all of its requests to an HTTP-registered Agent instance equally as it would for a JMX target.

[Bug] Discovery self-node has wrong annotation port number when publishing JMX URL

In Registration.java:

        URI uri = callback;
        if (preferJmx && jmxPort > 0) {
            uri =
                    URI.create(
                            String.format(
                                    "service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi",
                                    hostname, jmxPort));
        }
        DiscoveryNode.Target target =
                new DiscoveryNode.Target(
                        realm,
                        uri,
                        appName,
                        jvmId,
                        pid,
                        hostname,
                        uri.getPort(),
                        javaMain,
                        startTime);

The uri.getPort() here returns the correct expected port number when the registration connectUrl is the agent's own HTTP connection. When the registration uses JMX, however, (ie when registration.prefer-jmx config property is set), then this returns -1 instead of the actual JMX port number.

[Epic] Smart triggers

The Agent should listen to metrics and events within the attached JVM and use these events to decide when to automatically start/stop JFR recording.

[Task] Devise a trigger condition syntax and implement processing from config

Idea 1

The syntax should easily map to SmallRye Config, so it should be something that can be expressed and understood when expressed as an environment variable name, as well as when expressed as a system property or properties file entry.

The syntax must also be able to specify:

  • the metric to observe
  • a condition about that metric
  • a trigger value for the condition
  • optional parameters such as a duration threshold that the condition must persist for to trigger
  • the name of a .jfc event template file that should be used when the trigger starts a recording

For example, <Process CPU Load (%), Value Greater Than, 0.2, For 30 Seconds> is a 4-tuple that could be expressed in such a syntax, which should cause a self-explanatory recording trigger. This might be expressed in a rudimentary way as PROCESSCPU_GT_20_30S=profiling.jfc, as a very rough example. This simple environment variable could be split on _ characters. The first field would be used to look up in a table which metric should be observed, and then implementation-specific details for each metric would yield an observation function. The second field, GT, would be matched against supported operations, in this case the > comparator, to apply to the result of the observation function and the parsed value of the third field, 20. In this case this third field could be interpreted as hundredths, ie 0.20. The fourth optional field of 30S would be interpreted as the duration threshold, so the implementation would need to maintain samples over time and check the condition across all of the samples before trigger the recording.


Idea 2

Rather than specifying the trigger conditions as SmallRye Config properties, a more freeform syntax could be used and passed to the agent as an argument.

https://docs.oracle.com/javase/8/docs/api/java/lang/instrument/package-summary.html

$ java -javaagent:/path/to/cryostat-agent.jar=argumentstring

where argumentstring could be more flexible and easier to parse than environment variable names, since more characters should be permissible to help delineate fields. This could also be used to pass a path to the agent that contains something like a JSON file containing a serialized form of the triggers to be used.

[Bug] Server stored credentials should include JVM ID for uniqueness

In #133 , the previous stored credentials match expression used only the target's JVM ID, which caused a conflict with credentials actually stored for JMX connection use. The new way uses the target's callback URL as the target.connectUrl, which will be an agent HTTP API URL and not conflict with JMX credentials. However, in some scenarios like unclean shutdowns of the Agent and/or Cryostat, old stale Credentials entries may be left behind in the encrypted keyring, which may have the same connectUrl as the (potentially restarted, or at least re-registering) Agent instance. This will cause another conflict and prevent Agent registration. To help mitigate this, Agents can store their credentials with an expression matching on both the callback/connectUrl as well as the JVM ID.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.