elastic / apm-agent-nodejs Goto Github PK
View Code? Open in Web Editor NEWElastic APM Node.js Agent
Home Page: https://www.elastic.co/guide/en/apm/agent/nodejs/current/index.html
License: BSD 2-Clause "Simplified" License
Elastic APM Node.js Agent
Home Page: https://www.elastic.co/guide/en/apm/agent/nodejs/current/index.html
License: BSD 2-Clause "Simplified" License
So excited to meet the fantastic tools!! And I can catch the metrics of http requests. However I am still confused how to catch the CPU and Memory usage of the agents hosts. Is there some way to get it?
The captureError
function can take 3 types of arguments in the first position:
The latter is currently undocumented.
If captureError
is passed an object instead of an error or a string we expect it to be in the format of { message: '...', params: [] }
.
Example usage:
agent.captureError({
message: 'Could not find user %s with id %d in the database',
params: ['Peter', 42]
})
This makes it possible for us to better group error messages that contain variable data like ID's or names.
Associate all logged errors with the transaction that was active at the time the error occurred.
This is done by adding the following property and object to the root of the error JSON document sent to the intake API:
"transaction": {
"id": "<uuid>"
}
Related APM Server PR: elastic/apm-server/pull/437
Today codecov will post a pretty length comment on PR's. Takes up a lot of real-estate 😞
Instead we should do what the Python agent is doing and just have it change the status of the PR:
See the codecov.yml file from the Python agent for inspiration.
Currently we sent up process related data to the APM Server in the service
JSON object (both for errors and transactions). These are service.pid
, service.process_title
and service.argv
. We should instead move these properties to their own process
namespace:
"process": {
"pid": 1234,
"title": "node",
"argv": [
"node",
"index.js"
]
}
Related APM Server PR: elastic/apm-server#445
There's an issue in the current instrumentation of Node.js 8 and above that means that some traces will not be collected.
We're working on a fix. Please subscribe to this issue to stay up to date.
I misconfigured the apm server url by accident serverUrl: 'https://localhost:82000'
.
This lead to countless retries of sending data to the APM Server, see logs:
logging error 6a9ff729-cec0-45d5-bc5e-a57e27f2a25d with Elastic APM
logging error 54c4eb0f-5a98-4b41-b07e-5b4f60a2ba5f with Elastic APM
logging error 8cbe19af-37fb-468b-b9bf-8070b5531ca4 with Elastic APM
logging error f9d805e8-488f-4a41-b4f6-20154d8101fe with Elastic APM
logging error 53df0b4b-2cd0-419c-8af4-27be7f36d716 with Elastic APM
logging error 80fd91b6-c09f-48cb-8787-b8ac0d1d1775 with Elastic APM
logging error c9858407-17a3-4968-8cce-297e0bd31f3d with Elastic APM
logging error b3a69e7b-8eec-4580-a219-e72eb4e5b637 with Elastic APM
logging error 6935b29d-9629-4dc2-8b62-9954e6559693 with Elastic APM
logging error 6e31e7cb-4a2d-44e1-bb1c-279bef550de5 with Elastic APM
logging error cf647462-65e1-48e3-a3c4-15a3779bce6f with Elastic APM
logging error 79e1590c-bdad-4c41-b97e-ccc43189bdbb with Elastic APM
...
When I just misconfigure to another wrong port, e.g. 8201
I get a proper Connection Refused
error.
While poking in this repo I attempted to follow links in the docs folder, which exist for the purpose of linking to the docs once they have been built. After discussion, it was decided that adding a line to each of the docs/*.asciidoc files as already exists here so others aren't confused will be beneficial.
GraphQL 0.12 have just been released and we need to add support for it.
Currently our test-suite fails if run against this version.
According to the compatibility page we're supposed to support hapi all the way back to version 9. But if running the hapi tests on version 15.0.1 or older of hapi, the test suite fails.
I believe that it used to work for these versions, but the tests were never covered by our TAV test suite, so the issue might have snug in at some point later when refactoring either the tests or the actual hapi instrumentation code.
Today we have a setTag(key, value)
function which allows you to set a tag on the current transaction.
As discussed on the APM forum, it would be nice to be able to set more than one tag at a time.
Since the existing setTag
is singular, I'm thinking it's better to create a new function. We can keep supporting both functions or we can deprecate the old one. I'm not sure what's best.
Proposed new API:
agent.setTags({key1: 'value', key2: 'value'})
Questions:
setTag
or keep supporting it?setTags
multiple times just add to the list of tags?Just as the Python agent, we should upload code coverage reports to codecov.io as part of CI (I imagine only when testing master).
Here's their Node.js example: https://github.com/codecov/example-node
Here's the Python agent codecov.yml file: https://github.com/elastic/apm-agent-python/blob/master/codecov.yml
Currently we store hapi events under context.custom
. This location is only supposed to be used for storing user provided data. The agent is not supposed to stored data here automatically:
Instead we should consider something like this:
context.library.hapi.event = event
That however would mean custom logic in our Kibana plugin.
Related: #12
Our dependency tests for Node.js v0.12 fail quite often on Travis CI.
The failure is always the same:
events.js:85
throw er; // Unhandled 'error' event
^
Error: read ECONNRESET
at exports._errnoException (util.js:746:11)
at TCP.onread (net.js:561:26)
https://travis-ci.org/elastic/apm-agent-nodejs/jobs/268019944
It happens when the test-all-versions module tries to install a dependency for testing, i.e. before the actual test runs. This leads me to believe that it's the TCP connection to the npm registry that fails.
If we run the tests with longjohn in an attempt to get a more meaningful stack trace, it looks like this (which isn't much help):
Error: read ECONNRESET
at exports._errnoException (util.js:746:11)
at TCP.onread (net.js:561:26)
---------------------------------------------
at fireErrorCallbacks (net.js:458:15)
at TLSSocket.Socket._destroy (net.js:497:3)
at TCP.onread (net.js:561:17)
https://travis-ci.org/elastic/apm-agent-nodejs/jobs/266829144
I haven't been able to recreate this issue locally and I think it's related to the specific Linux or machine setup Travis is using for these tests combined with Node.js v0.12. I don't remember seeing these failures until after we switched on sudo: required
in the .travis.yml file (was sudo: false
previously).
I can't find any place where we or a dependency don't listen for the error
event and this only happens in Node.js v0.12. This leads me to believe that it might actually be a bug in Node it self.
Possible options to solve this:
sudo: false
(this might make the tests run at about half speed because the containerized setup on Travis isn't as fast)Since "app" is being renamed to "service", the in_app
property on stack trace frames doesn't make sense any more. Calling it in_service
can be really confusing, so its been decided to revert the meaning of it and call it either library_frame
or lib_frame
. So when ever in_app
was true
previously this new property should be false
and vice versa.
Related APM Server issue: elastic/apm-server#384
APM so far looks freaking awesome! With that in mind, I would also like to instrument my Lambda functions/services/applications.
I'm using the serverless-framework for developing and deploying nodejs functions to AWS Lambda.
I have come so far as to conclude that code within a Lambda function is executed after nodejs starts the transaction. Therefore calling apm-agent-nodejs within the nodejs function never records/starts the transaction.
When testing locally with serverless offline
, the very first transaction is not recorded, but since the server keeps running with APM loaded the consequent requests generate Transactions.
Is there a way to "hook" onto an already ongoing nodejs request/transaction and gather whatever information and timings is visible within the functions own bubble?
Major part of our solution relies on websocket.
Just curious, are there plans to support websocket?
The meaning of the uncaught
property on errors should be flipped and renamed to handled
. This is more programming language neutral.
Even though we landed a commit in master that should prevent CodeCov from posting comments on PR's and instead use GitHub status indicators, we still get the comments and the status indicators are nowhere to be found.
More research is needed 😄
If calling captureError
with either a string a the special parameterized object, the message will be stored as log.message
, but a stack trace will not be captured along with it. This should be added to log.stacktrace
just at it happens with exception.stacktrace
for regular errors.
As we're not dealing with an actual error object, we'll have to capture the stack trace our selfs. When we do we should pop the top stack frame(s) (as those will point to the agent), so that the new top stack frame is the place from where captureError
was called.
Having had a quick chat with @watson and @Qard we said we should add a CONTRIBUTING.md file. A question I didn't think to ask at the time is if we ask community members to sign the Elastic Contributor License Agreement, as we do with Elasticsearch and other projects. Thoughts @watson, @Qard ?
There have long been an issue with the latest version of Node.js 7 (v7.10.x) that means we cannot instrument it properly as all net.connect
calls uses an internal function call that cannot be patched. This issue was fixed in 8.x, but the fix was never back-ported to the 7.x branch. So we only support up to version 7.9.0 on the 7.x branch.
Currently we have pinned our CI servers to test against Node.js version 7.9.x instead of 7.x, but that's not really a realistic scenario as people who still run version 7 would most likely run the latest version.
So maybe we should consider completely dropping support for version 7 (and 5 while we're at it) so that we don't waste time testing these versions on CI?
For reference, here's the official end-of-life dates for all the major Node.js versions:
If you're running express with Elastic APM, you today need to add our custom error catching middleware manually.
This have confused quite a few people and is unique to how Express compare to our setup instructions for instrumenting other Node.js frameworks, so it would be nice if middleware was added automatically when starting the agent, similar to how Express is patched in other ways.
At the moment we don't send up param_message
if there are no placeholders.
It's useful to have the param_message
always be around, even when it has no placeholders. It allows people to make aggregations in Elasticsearch across the param_message
in a consistent way instead of having to use a script to fall back to message
when param_message
is missing.
To run Elastic APM on multiple environments of the same service, you'd previously normally add a post-fix to the serviceName
, e.g: my-service-staging
or my-service-prod
.
In Elastic APM 6.2 we'll introduce a new environment
property tailored specifically for this. This means that users should use the same serviceName
(my-service
), and instead specify the environment via a separate property whos value should be sent up to the APM Server using service.environment
in the payload.
We need to settle on the name of the config option, but we'll probably name it either env
or environment
.
Example usage:
require('elastic-apm-node').start({
serviceName: 'my-service',
env: 'production'
})
The associated environment variable would then be either ELASTIC_APM_ENV
or ELASTIC_APM_ENVIRONMENT
depending on what name we end up choosing.
Related APM Server PR: elastic/apm-server#366
Since we recently dropped support for versions of Node.js older than v4, I've been wondering if there would be a benefit to switching to use const
and let
internally instead of var
.
I personally don't mind using var
, and as far as I know up until recently V8 would actually not perform as good with const
as it would with var
. But I expect this have since been improved upon, so it might be worth looking into.
There's of course also the option of just starting to use const
and let
in new code, but not going back and refactoring the old code. It would be kind of an eyesore though, so I hope it's easy to switch everything in one big monster commit.
If we decide to switch to using const
and let
we should also make sure to update our documentation code examples to use const
and let
.
Please comment with pro's and con's below - and please stay civil 😅
Version 3 of the Node.js MongoDB driver have just been released and we need to add support for it, but instead of simply upgrading our existing instrumentation, we should consider changing how we instrument MongoDB.
We currently patch the mongodb-core module, which is the underlying module used by the official mongodb module. Most 3rd party MongoDB modules depends on the main mongodb module, but we decided to target the more low level mongodb-core module since the mongojs module used to use that directly instead of mongodb. Recently however mongojs have switched to use the mongodb module as well.
The mongodb module have an official APM API that means we don't have to patch anything. This means more stability and easier support for future MongoDB releases.
You can read about the MongoDB APM API here:
Investigate failing tav
tests on Jenkins.
-- running test "node test/instrumentation/modules/mysql/pool-release-1.js" with mysql
14:50:33 module.js:491
14:50:33 throw err;
14:50:33 ^
14:50:33
14:50:33 Error: Cannot find module 'bignumber.js'
14:50:33 at Function.Module._resolveFilename (module.js:489:15)
14:50:33 at Function.Module._load (module.js:439:25)
14:50:33 at Function.Module._load (/app/node_modules/require-in-the-middle/index.js:22:24)
The agent currently doesn't instrument any templating libraries. The most popular one based on download count that I could find was the handlebars module, so it would make sense to support that first.
The trace (soon to be renamed span) type should be template
to align with the other agents.
Currently the agent sends up all information about all transactions to the APM Server. In Elastic 6.2 we should allow for "head based random sampling" - i.e. where only a configurable percentage of the total transactions are sampled. The non-sampled transactions will still be sent to the APM Server, but without any spans and without any context.
The config option should be called transactionSampleRate
and default to 100% - i.e. everything is collected.
It's preferable if we can find a way to disable the wrapping of callbacks when a transaction isn't sampled. This means that there's certain things we cannot do as easily as we'll not have an active transaction available during for instance the request/response life cycle. This can mean that setting the name and result for HTTP based transactions can be difficult and might require a bit of refactoring.
Note that this feature will need to go into the elastic-6.2 branch as it can't be released until Elastic 6.2 version is released.
Elastic APM is changing in 6.2 to align more with OpenTracing and modern APM naming conventions. This means that we're renaming trace to span across the entire stack.
This includes both functions such as buildTrace
, but also the APM Server API property names like traces
.
Currently, whenever a stack trace is being sent to the APM Server, the agent will include the source code around each of the lines in the stack trace.
While this feature does use an in-memory LRU cache, it would probably make sense to add a config option that completely disables this feature. This will reduce both the payload size and the processing time.
Config option naming idea: sourceContext
(environment variable: ELASTIC_APM_SOURCE_CONTEXT
)
Example of use:
require('elastic-apm-node').start({
appName: 'my-app',
sourceContext: false
})
This is a Meta Issue to collect ideas and improvement suggestions for further Improvements regarding Jenkins setup.
Elastic APM is changing in 6.2 to align more with OpenTracing and modern APM naming conventions. This means that we're renaming app to service across the entire stack.
This includes both config options such as appName
, but also the APM Server API property names.
It should no longer be considered behind a feature flag (ff) and should therefore default to being on (true
). It also used to only add an extra top frame to the existing stack trace, but now it adds a full separate stack trace to the error (currently under the log
namespace), so the name doesn't make much sense any more either.
What would a good new name be? captureLocationStackTrace
is a bit wordy and I don't really like it for several reasons, but I can't come up with anything better either
(remember to document this new config option in the docs when adding it).
The docs (https://www.elastic.co/guide/en/apm/agent/nodejs/current/agent-api.html#apm-capture-error) states, that error
can be an Error
object, a message string or a special parameterized object.
I try to capture some custom errors (passed from client-side to the server as simple strings). I'm using apm.captureError(error[, options][, callback])
If I provide an Error
object (eg. new Error(myCustomMessage)
), everything works as expected (except I try to get rid of the stack trace for that specific capture).
If instead, I provide myCustomMessage
(a simple string) directly to apm.captureError()
, the errors is logged as "N/A" (and stack trace is also N/A).
Last, the special parameterized object
is not described at any place. What is the structure? As I hope this can be useful in my case.
I actually have a stacktrace (captured by the client) that I would love to include in my APM error.
First of all, great work on APM, can't wait to feed it some data!
But I have one issue: The agent mangles the URL given to send data to.
Configure serverUrl: 'https://my.site.com/apm/
and it will use https://my.site.com/
. This breaks the very definition of "URL" as well as many (including mine) common reverse-proxying setups.
I have created a very basic nodejs test app. It is basically just
var apm = require('elastic-apm').start({
// Set required app name (allowed characters: a-z, A-Z, 0-9, -, _, and space)
appName: 'testNode App',
// Use if APM Server requires a token
// secretToken: '',
// Set custom APM Server URL (default: http://localhost:8200)
// serverUrl: '',
})
var http = require('http');
var server = http.createServer(function(req, res) {
res.writeHead(200);
res.end('Test APM');
});
server.listen(8080);
When calling curl localhost:8080
a few times and waiting until the requests get pushed to the server I only have one document in the database. It looks like it always only sends 1 transaction to the server (at least for the same route).
The APM Server intake API supports a sending up language information:
"language": {
"name": "ecmascript",
"version": "8"
}
Question is what we should put as name
and version
?
I guess it makes most sense to log the version of ECMAScript used, but question is how we extract this information from a Node process
When we have invalid requests with no JSON body at all APM agent crashes with this output to console
All request with contentType : application/json should actually send at lease "{}", but anyways I think APM agent should handle this properly.
My only guess that it is not handling this properly is actually error log, if it was an info level log I'd consider it safe.
Before publishing:
context.db
formatserverUrl
config paramAfter publishing:
Today there's a bit confusion about our stack trace limit (i.e. max number of frames in stack traces we send up to the APM Server).
On one hand we have the stackTraceLimit
config option, which defaults to Infinity
.
On the other hand we manually truncate after 300 frames:
apm-agent-nodejs/lib/request.js
Lines 167 to 170 in 66f54d1
This needs to be aligned. I imagine we don't even need to manually truncate if we configure V8 to limit the number of frames for us based on the stackTraceLimit
option (which I believe we currently do).
But it would make sense to lower the default limit from Infinity
to something more reasonable. Let's set it to 100
.
When fixing this issue, remember to update the stackTraceLimit
documentation with the new default limit.
The flushInterval
config option defaults to 60 seconds today. That's too long and is a left over from the Opbeat agent where all requests from all Opbeat users was sent to one single endpoint. A better default would probably be something like 30 seconds.
When changing the default, remember to update the documentation as well.
Related Discuss post: https://discuss.elastic.co/t/hapi-v17/112103
As it's currently configured, all documentation changes that are merged into the master branch is automatically deployed to this URL: www.elastic.co/guide/en/apm/agent/nodejs
Say we're merging in a PR that contains a new feature. The PR should also contain the documentation for said feature. As soon as it's merged into master, the documentation will go live, but if we don't publish a new version of the module containing that new feature right away, the documentation will be out of sync with the module people get when running npm install elastic-apm-node
.
It would be convenient if we had a way to deal with that.
Today the Node.js agent have only one set of documentation, but it's possible to have different versions of the documentation for different releases and even one for stuff that's not yet published. This is all controlled in the conf.yml file in the docs repo:
https://github.com/elastic/docs/blob/320c4229b14a9e82e1d5a231590d7787af033fce/conf.yaml#L762-L772
The configuration lists an array of branches, each of which should be a branch in the agent repository and each of which will result in a separate version of the docs. There's also a special key called current
which points to the default documentation that should be shown to users unless they choose otherwise.
If there's more than one branch, it will result in a dropdown on the documentation pages from which you can choose the version of the documentation you want to view:
Change the one branch listed in the branches
array to something else than master, e.g. latest
, and then have that be a branch that we merge master
into after we publish a new version of the agent to npm (i.e. no commits directly to the latest
branch).
Have one branch per major version of the agent and list each of those (+ master
) in the branches
array in conf.yml. This is similar to how the Python agent handles it and also how Node.js core manages releases.
If we choose this solution, we need to find a good/easy way to get the commits from master into the release branch and how to do the actual release.
The easiest way (which doesn't allow us to make patch or minor releases to an old major version though), would be to still to do the release from the master branch, but then merge it into the latest release branch right after and not allow any commits directly to the release branch. Here release branches of older major versions would never receive new commits after a new major was released.
Suggestions welcome :)
Please come with other solutions if I'm there's something I overlooked.
We should find a way to assemble a full URL (https://example.com:8080/path?query=params) from a combination of req.url
, http headers and socket information.
The full url should be added to the APM Server payload as the property context.request.url.full
.
If a full URL cannot be assembled because we for instance are missing the domain, the full
property should not be set.
This is where we currently process the URL from requests:
apm-agent-nodejs/lib/parsers.js
Lines 88 to 132 in 66f54d1
Currently we log an "error" if an incoming HTTP request have been taking longer than 25 seconds but then its TCP socket is suddenly closed without the HTTP request being properly ended:
apm-agent-nodejs/lib/instrumentation/http-shared.js
Lines 32 to 35 in a077ef5
Along with the "error" we log the abortedTime
- i.e. how long the HTTP request was open for before the socket was closed.
This property is currently logged under what used to be called extra
(which today is called context.custom
).
First issue is that we no longer support the extra
property, so this will currently not work. The second issue is that the context.custom
data is only supposed to be supplied by the user and is not supposed to be touched by the agent.
So we need a better place to put this information. Or we need to remove this feature from the Node agent for now until we figure out a better approach.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.