ampproject / error-tracker Goto Github PK
View Code? Open in Web Editor NEWAMP Project's error logging server
License: Apache License 2.0
AMP Project's error logging server
License: Apache License 2.0
Hi there,
I was reading the source code of this repository for a project of mine and came across this line of code: routes/error-tracker.js#L102
If you follow where the value of req.body
flows in the code, you can can see that it ends in a call to deleter()
in utils/cache.js
. So my question is this: which function is responsible for sanitizing req
? It can be potentially exploited (or just result in an undesired behavior). I could not find information about it in the repository so I apologize if I misread or overlooked something.
Thanks
Context: 90% of errors reported by clients on the Stable release channel are throttled by the backend service and never logged. By default, GCP has a 10 GHz-seconds limit on total CPU allocation for Cloud Functions invocations. A recent surge in errors from a (fixed) high-frequency error pushed us over the edge, and we hit our daily quota for CPU allocation.
For now, I've requested a quota increase to better handle load spikes like this. To reduce some of that request load, we can move the 90% throttling of Stable traffic to the client. This can be a very simple check implemented by the runtime that would drastically reduce backend load.
/cc @jridgewell
TypeError: Cannot read property 'includes' of undefined
at /app/routes/error-tracker.js:74:46
at Array.some (native)
at ignoreMessageOrException (/app/routes/error-tracker.js:73:25)
at getHandler (/app/routes/error-tracker.js:128:7)
Might be a bug in Closure Compiler's source maps.
/cc @jridgewell
RuntimeError: memory access out of bounds
at wasm-function[18]:5277
at wasm-function[27]:66
at BasicSourceMapConsumer._parseMappings (/app/node_modules/source-map/lib/source-map-consumer.js:329:47)
at BasicSourceMapConsumer._getMappingsPtr (/app/node_modules/source-map/lib/source-map-consumer.js:315:12)
at _wasm.withMappingCallback (/app/node_modules/source-map/lib/source-map-consumer.js:510:14)
at Object.withMappingCallback (/app/node_modules/source-map/lib/wasm.js:95:11)
at BasicSourceMapConsumer.originalPositionFor (/app/node_modules/source-map/lib/source-map-consumer.js:508:16)
at unminifyFrame (/app/utils/unminify.js:79:49)
at stack.map (/app/utils/unminify.js:153:36)
at Array.map (<anonymous>)
From the Oxidizing Source Maps with Rust and WebAssembly article, the author seems aware of this:
With the WebAssembly implementation, Chrome would erroneously throw RuntimeError: memory access out of bounds. Using Chrome’s debugger, the supposed out-of-bounds access was happening in an instruction sequence that does not exist in the .wasm file. All other browser’s WebAssembly implementations successfully ran the benchmark with this input, so I am inclined to believe it is a bug in the Chrome implementation.
We should consider downgrading the source-map
dep to see if it'll fix the issue.
To see what happens to your code in Node.js 10, Greenkeeper has created a branch with the following changes:
.travis.yml
package.json
files was updated to the new Node.js versionIf you’re interested in upgrading this repo to Node.js 10, you can open a PR with these changes. Please note that this issue is just intended as a friendly reminder and the PR as a possible starting point for getting your code running on Node.js 10.
Greenkeeper has checked the engines
key in any package.json
file, the .nvmrc
file, and the .travis.yml
file, if present.
engines
was only updated if it defined a single version, not a range..nvmrc
was updated to Node.js 10.travis.yml
was only changed if there was a root-level node_js
that didn’t already include Node.js 10, such as node
or lts/*
. In this case, the new version was appended to the list. We didn’t touch job or matrix configurations because these tend to be quite specific and complex, and it’s difficult to infer what the intentions were.For many simpler .travis.yml
configurations, this PR should suffice as-is, but depending on what you’re doing it may require additional work or may not be applicable at all. We’re also aware that you may have good reasons to not update to Node.js 10, which is why this was sent as an issue and not a pull request. Feel free to delete it without comment, I’m a humble robot and won’t feel rejected 🤖
There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.
Your Greenkeeper Bot 🌴
With the migration to Cloud Functions, several things have to happen to fully migrate error reporting and introduce new features. Since Cloud Functions does not provide built-in traffic migration or versioning, we have a stable and beta channel with different URLs that the Runtime will divert between to allow testing new error reporting deployments. Stable is promoted explicitly by pushing a git tag, while beta releases are built automatically from master
on each commit.
The new error reporting endpoints live at new URLs:
https://us-central1-amp-error-reporting.cloudfunctions.net/r
https://us-central1-amp-error-reporting.cloudfunctions.net/r-beta
Migrating to the new error reporting conventions includes a handful of changes:
/r
to the AMP Runtime and slowly migrating error reporting traffic over/r-beta
to the AMP Runtime and migrating a slice of traffic overTo minimize risk of something breaking, and to minimize the impact of the changes on developer workflow, I propose the following plan.
Week of March 30
Week of April 6
This week allows us to confirm the new reporting endpoint reports correctly and scales as needed.
Week of April 13
This week allows us to verify that the new reporting endpoint can scale to production traffic, and validates the 90-10 split to the beta endpoint.
Week of April 20
By this point, we have all errors being reported through the new endpoints, so we can start making changes to service buckets and version names while maintaining enough reporting volume in both the legacy and new buckets to ensure a useful amount of data in one or both at all times during the transition.
Week of April 27
Week of May 4
Week of May 11
/cc @rsimha @ampproject/wg-runtime
I'd like a way to be able to get through the filtering and throttling for debugging purposes.
This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.
These updates have all been created already. Click a checkbox below to force a retry/rebase of any.
cloudbuild.yaml
.github/workflows/ci.yml
actions/checkout v4
actions/setup-node v4
package.json
@google-cloud/logging 11.0.0
@google-cloud/storage 7.11.1
@jridgewell/trace-mapping 0.3.25
express 4.19.2
http-status-codes 2.3.0
lodash.debounce 4.0.8
node-fetch 3.3.2
safe-decode-uri-component 1.2.2-native
@google-cloud/functions-framework 3.4.0
@types/express 4.17.21
@types/lodash.debounce 4.0.9
@typescript-eslint/eslint-plugin 7.11.0
@typescript-eslint/parser 7.11.0
chai 5.1.1
eslint 8.57.0
eslint-config-prettier 9.1.0
eslint-plugin-chai-expect 3.0.0
eslint-plugin-import 2.29.1
eslint-plugin-sort-destructure-keys 2.0.0
eslint-plugin-prettier 5.1.3
mocha 10.4.0
nock 13.5.4
prettier 3.2.5
sinon 18.0.0
source-map 0.7.4
superagent 9.0.2
This is a tracking FR that can be closed when the default branch of this repo is renamed from master
to main
. For more context, see ampproject/amphtml#32195 and this plan.
I'm not sure what other changes need to go in to amphtml
or amp-github-apps
or elsewhere in order to let error reporting continue to work. Happy to make any changes you think are needed.
/cc @rcebulko and @jridgewell for visibility and guidance on how to proceed.
Currently when stack driver logging experiences an error while logging errors reported to the Error tracker, it dumps the errors it was trying to log. The on error implementation of logging should handle these errors in a safe way without dumping them.
1.2.1-native
to 1.2.1
.This version is covered by your current version range and after updating it in your project the build failed.
safe-decode-uri-component is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.
The new version differs by 22 commits ahead by 22, behind by 17.
6abe20a
1.2.1
f772777
Merge pull request #2 from andrey-sh/master
86847b8
Upgrade packages
ac3cac9
Fixup README
5810dbe
Change all let and const on var
53000fe
1.1.3
422884c
Update README
814d3d0
Reduce number of test cases
08364f8
1.1.1
3ed902f
Fix tests
c8d5ab4
1.1.0
0d7c1c1
Revamp tests
b511c51
Update test error context
76e0c39
Modify UTF8_DATA table to gzip better
db98084
Update readme
There are 22 commits in total.
See the full diff
There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.
Your Greenkeeper Bot 🌴
(node:16) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: Can't set headers after they are sent.
5.1.3
to 5.2.0
.This version is covered by your current version range and after updating it in your project the build failed.
@google-cloud/logging is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.
The new version differs by 3 commits.
26bac83
chore: release 5.2.0 (#530)
1e8c67f
feat: add path template parsing for billing, organizations, and folders (#529)
4a1883f
build: use config file for linkinator (#524)
See the full diff
There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.
Your Greenkeeper Bot 🌴
TypeError: Cannot read property 'startsWith' of undefined
at getHandler (/app/routes/error-tracker.js:77:14)
at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
at next (/app/node_modules/express/lib/router/route.js:137:13)
at Route.dispatch (/app/node_modules/express/lib/router/route.js:112:3)
at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
at /app/node_modules/express/lib/router/index.js:281:22
at Function.process_params (/app/node_modules/express/lib/router/index.js:335:12)
at next (/app/node_modules/express/lib/router/index.js:275:10)
at expressInit (/app/node_modules/express/lib/middleware/init.js:40:5)
at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
Right now RTVs are used to reference release versions:
console
message in AMP pagesIn many of these contexts, a readable form of this version could be more useful for communication, understanding, and cross referencing between threads, errors, etc. I'm leaning towards a mapping like the following:
'052004030010070' --> '04-03 Nightly-Control (0010)'
'012004030010000' --> '04-03 Stable (0010)'
'012004030010001' --> '04-03 Stable (0010+1)'
'012004030010002' --> '04-03 Stable (0010+2)'
'022004030010070' --> '04-03 Control (0010)'
'002004172112280' --> '04-17 Experimental (2112)'
'032004172112280' --> '04-17 Beta (2112)'
'042004210608300' --> '04-21 Nightly (0608)'
These version strings
+1
, +2
, etc.These strings should be easier to remember, understand/parse, and communicate about. The fingerprints could act as a shorthand in discussions.
"We promoted Stable from 0010 to 2112, but 2112 was borked so we rolled back to 0010 until we could validate cherry-picking #1337 to 0010+1 fixed the issue. The bugfix #1337 is already in 0608"
I think some version of this stringification would improve clarity in most or all of the locations above. In some places (ex. release tracking issue), it would make sense to include both the stringified form and the raw RTV.
The service has code to handle each, but neither is used in https://github.com/ampproject/amphtml/blob/master/src/error.js and AFAICT neither shows up in any of the logs.
/cc @jridgewell Are you familiar with these by chance?
Currently, this application is run on Google App Engine (Flexible) with automatic instance scaling between 4 and 100 instances (see app.yaml), typically floating around ~20 instances.
As a result of settings that have been deprecated since the error-tracker was last deployed, it's currently in a state where re-deploying causes a breakage related to GAE's readiness and liveness checks. @jridgewell and I have tried without success to identify the source. For an example of the impact of this, LTS errors can't currently be logged, since the update to support them is seemingly impossible to deploy
One way to simplify the deployment process and unblock updates to this app is to move to a serverless deployment to Cloud Functions. As can be seen in the app entrypoint, the server defines very few endpoints (health checks which respond 200 and /r
which delegates work to an errorTracker
handler). It would be relatively straightforward to instead package these handler functions as two Cloud Functions--or even just one, since the health check would no longer be relevant.
Currently each GAE instance of the app fetches https://cdn.ampproject.org/rtv/metadata to identify which RTVs are currently being served, and drop errors from all older RTVs. The app uses a 5-50min cache for the diversion list, and these caches can at times be out-of-sync across the instances. It also caches source maps for 2 weeks to provide unminified line numbers in stack traces.
Cloud Functions are executed independently with no guarantees about the environment or local state. Initially, some form of Cloud Storage or Memorystore were considered to hold this RTV cache and share it across functions. However, according to the Optimizing Networking section, global state is often maintained between function invocations.
Cloud Functions spins up as many instances as required to handle the load, and only the exported function is called. Dependency loading, and creation/instantiation of global variables, happens only a) at the first function deployment and b) when new instances are started to scale up. In other words, if the cache is declared in the global scope outside the handler function, it can still be shared by function invocations on that instance. We will need to test/monitor the behavior of this scaling, but it appears the behavior will be similar to GAE instance scaling.
The same logic applies to the keys fetched at startup, though it may be possible to instead embed the keys directly in the Cloud Function configuration in Pantheon, eliminating the need to fetch them explicitly
Cloud Functions have a distinct handler URL of the form https://[REGION]-[PROJECT_NAME].cloudfunctions.net/[FUNCTION_NAME]
. This means to actually use a serverless deployment, error-reporting code in amphtml
(and any other projects that report to the app currently) would need to be updated to use the new URL.
We would likely want to do this behind a flag in the Experimental build, or configure error reporting to report to the new URL X% of the time and ramp up, so we can migrate to the new deployment gradually. Despite the clients reporting to two different endpoints, the errors themselves would still end up in the same place, so this migration should not impact our ability to monitor or assess error levels.
One alternative was to forward requests from the App Engine app to the Cloud Functions URL, but since we can't update the GAE app, that idea is moot.
/cc @ampproject/wg-infra @ampproject/wg-runtime
Might need to update the file name to run.
Currently, the service buckets for Beta and Experimental are called "CDN/Origin Development". The meaning of "Development" isn't clear to all engineers. A more obvious choice would be "CDN/Origin 1%" (or "CDN/Origin 1-Percent"), since that's how most people think about the Beta/Experimental release channels anyway.
/cc @ampproject/wg-infra @jridgewell
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.