opstrace / opstrace Goto Github PK
View Code? Open in Web Editor NEWThe Open Source Observability Distribution
Home Page: https://opstrace.com
License: Apache License 2.0
The Open Source Observability Distribution
Home Page: https://opstrace.com
License: Apache License 2.0
$ ./opstrace --version --log-level=debug
e790a06a-ci
2020-12-02T09:38:13.389Z debug: BUILD_INFO_COMMIT: e790a06
2020-12-02T09:38:13.389Z debug: BUILD_INFO_TIME_RFC3339: 2020-12-02 08:12:55+00:00
2020-12-02T09:38:13.390Z debug: BUILD_INFO_HOSTNAME: c918aab90943
2020-12-02T09:38:13.390Z debug: BUILD_INFO_BRANCH_NAME: main
2020-12-02T09:38:13.391Z debug: shut down logger, then exit with code 0
$ ./opstrace create aws jp2 -c ~/dev/opstrace/ci/cluster-config.yaml
2020-12-02T09:38:26.924Z info: rendered cluster config:
{
"data_api_authorized_ip_ranges": [
"0.0.0.0/0"
],
"data_api_authentication_disabled": false,
"metric_retention_days": 7,
"log_retention_days": 7,
"cert_issuer": "letsencrypt-staging",
"env_label": "ci",
"tenants": [
"default"
],
"controller_image": "opstrace/controller:e790a06a-ci",
"node_count": 3,
"aws": {
"zone_suffix": "a",
"region": "us-west-2",
"instance_type": "t3.2xlarge"
},
"cloud_provider": "aws",
"cluster_name": "jp2",
}
2020-12-02T09:38:26.925Z info: Before we continue, please review the set of state-mutating AWS API calls emitted by this CLI during cluster creation: https://go.opstrace.com/cli-aws-mutating-api-calls/e790a06a-ci
Proceed? [y/N] y
...
2020-12-02T10:13:26.405Z info: cluster creation finished: jp2 (aws)
--
$ curl -L https://go.opstrace.com/cli-latest-linux-tbz | tar xjf -
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 283 100 283 0 0 484 0 --:--:-- --:--:-- --:--:-- 484
100 1 0 1 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0
bzip2: (stdin) is not a bzip2 file.
tar: Child died with signal 13
tar: Error is not recoverable: exiting now
At least do this:
Follow up to #56
We should reduce the roles applied to the cert-manager and cortex service accounts as much as possible.
When the DNS service login fails with a permanent error (which a 401 error is supposed to be: a non-retryable error indicating that credentials are bad) then we have to handle that situation properly.
What's currently happening is that the error is unhandled, emits a stack trace, and consumes a high-level create
attempt -- until exhausted:
[2020-12-01T13:54:24Z] 2020-12-01T13:54:24.270Z info: setting up DNS
[2020-12-01T13:54:24Z] 2020-12-01T13:54:24.270Z debug: DNSClient.GetAll()
[2020-12-01T13:54:24Z] 2020-12-01T13:54:24.342Z error: error during cluster creation (attempt 3):
[2020-12-01T13:54:24Z] Error: Request failed with status code 401
[2020-12-01T13:54:24Z] at createError (/snapshot/build/node_modules/axios/lib/core/createError.js:16:15)
[2020-12-01T13:54:24Z] at settle (/snapshot/build/node_modules/axios/lib/core/settle.js:17:12)
[2020-12-01T13:54:24Z] at IncomingMessage.handleStreamEnd (/snapshot/build/node_modules/axios/lib/adapters/http.js:236:11)
[2020-12-01T13:54:24Z] at IncomingMessage.emit (events.js:327:22)
[2020-12-01T13:54:24Z] at IncomingMessage.EventEmitter.emit (domain.js:485:12)
[2020-12-01T13:54:24Z] at endReadableNT (_stream_readable.js:1224:12)
[2020-12-01T13:54:24Z] at processTicksAndRejections (internal/process/task_queues.js:84:21) {
[2020-12-01T13:54:24Z] config: [Object],
[2020-12-01T13:54:24Z] request: [ClientRequest],
[2020-12-01T13:54:24Z] response: [Object],
[2020-12-01T13:54:24Z] isAxiosError: true,
[2020-12-01T13:54:24Z] toJSON: [Function (anonymous)]
[2020-12-01T13:54:24Z] }
[2020-12-01T13:54:24Z] 2020-12-01T13:54:24.342Z error: JSON representation of err: {
[2020-12-01T13:54:24Z] "message": "Request failed with status code 401",
[2020-12-01T13:54:24Z] "name": "Error",
[2020-12-01T13:54:24Z] "stack": "Error: Request failed with status code 401\n at createError (/snapshot/build/node_modules/axios/lib/core/createError.js:16:15)\n at settle (/snapshot/build/node_modules/axios/lib/core/settle.js:17:12)\n at IncomingMessage.handleStreamEnd (/snapshot/build/node_modules/axios/lib/adapters/http.js:236:11)\n at IncomingMessage.emit (events.js:327:22)\n at IncomingMessage.EventEmitter.emit (domain.js:485:12)\n at endReadableNT (_stream_readable.js:1224:12)\n at processTicksAndRejections (internal/process/task_queues.js:84:21)",
[2020-12-01T13:54:24Z] "config": {
[2020-12-01T13:54:24Z] "url": "https://dns-api.opstrace.net/dns/",
[2020-12-01T13:54:24Z] "method": "get",
[2020-12-01T13:54:24Z] "headers": {
[2020-12-01T13:54:24Z] "Accept": "application/json, text/plain, */*",
[2020-12-01T13:54:24Z] "authorization": "Bearer null",
[2020-12-01T13:54:24Z] "Content-Type": "application/json",
[2020-12-01T13:54:24Z] "User-Agent": "axios/0.19.2"
[2020-12-01T13:54:24Z] },
[2020-12-01T13:54:24Z] "transformRequest": [
[2020-12-01T13:54:24Z] null
[2020-12-01T13:54:24Z] ],
[2020-12-01T13:54:24Z] "transformResponse": [
[2020-12-01T13:54:24Z] null
[2020-12-01T13:54:24Z] ],
[2020-12-01T13:54:24Z] "timeout": 0,
[2020-12-01T13:54:24Z] "xsrfCookieName": "XSRF-TOKEN",
[2020-12-01T13:54:24Z] "xsrfHeaderName": "X-XSRF-TOKEN",
[2020-12-01T13:54:24Z] "maxContentLength": -1
[2020-12-01T13:54:24Z] }
[2020-12-01T13:54:24Z] }
[2020-12-01T13:54:24Z] 2020-12-01T13:54:24.343Z error: 3 attempt(s) failed. Stop retrying. Exit.
[2020-12-01T13:54:24Z] 2020-12-01T13:54:24.343Z debug: shut down logger, then exit with code 1
2020-12-02T09:07:04.532Z info: https://loki.default.jp.opstrace.io/loki/api/v1/labels: got expected HTTP response
2020-12-02T09:07:04.533Z info: All probe URLs returned expected HTTP responses, continue
2020-12-02T09:07:04.533Z info: cluster creation finished: jp (aws)
Show a friendly message pointing to https://jp.opstrace.io/
in that case.
Keep the migrations directory clean with migration squashing before releases
Consumed all high-level retries, didn't break the dependency cycle:
...
2020-12-01T11:07:08.623Z info: Destroying VPC
2020-12-01T11:07:30.531Z info: VPC deletion has started with status: RUNNING
2020-12-01T11:07:37.189Z error: error during cluster teardown (attempt 5):
ApiError: The network resource 'projects/vast-pad-240918/global/networks/jpdev' is already being used by 'projects/vast-pad-<snip>/global/firewalls/k8s-3cc02042af359f14-node-hc'
at new ApiError (/home/jp/dev/opstrace/node_modules/@google-cloud/common/build/src/util.js:59:15)
at Util.parseHttpRespBody (/home/jp/dev/opstrace/node_modules/@google-cloud/common/build/src/util.js:194:38)
at /home/jp/dev/opstrace/node_modules/@google-cloud/compute/src/operation.js:251:46
at /home/jp/dev/opstrace/node_modules/@google-cloud/compute/src/operation.js:234:7
at /home/jp/dev/opstrace/node_modules/@google-cloud/common/build/src/service-object.js:193:13
at /home/jp/dev/opstrace/node_modules/@google-cloud/common/build/src/util.js:369:25
at Util.handleResp (/home/jp/dev/opstrace/node_modules/@google-cloud/common/build/src/util.js:145:9)
at /home/jp/dev/opstrace/node_modules/@google-cloud/common/build/src/util.js:434:22
at onResponse (/home/jp/dev/opstrace/node_modules/retry-request/index.js:214:7)
at /home/jp/dev/opstrace/node_modules/teeny-request/src/index.ts:325:11 {
code: undefined,
errors: [Array],
response: undefined
}
2020-12-01T11:07:37.190Z error: JSON representation of err: {
"errors": [
{
"code": "RESOURCE_IN_USE_BY_ANOTHER_RESOURCE",
"message": "The network resource 'projects/vast-pad-240918/global/networks/jpdev' is already being used by 'projects/vast-pad-<snip>/global/firewalls/k8s-3cc02042af359f14-node-hc'"
}
],
"message": "The network resource 'projects/vast-pad-240918/global/networks/jpdev' is already being used by 'projects/vast-pad-240918/global/firewalls/k8s-3cc02042af359f14-node-hc'"
}
2020-12-01T11:07:37.191Z error: 5 attempt(s) failed. Stop retrying. Exit.
2020-12-01T11:07:37.192Z debug: shut down logger, then exit with code 1
(Seen locally, was trying to tear down an oldish GCP dev cluster of mine).
If there's a simple and robust way to add a tenant dynamically to a running cluster then I think we should offer that feature soon, from the CLI.
Let's discuss tenant removal separately.
Describe the bug
[2020-12-01T20:13:53Z] 2020-12-01T20:13:53.517Z error: error during cluster creation (attempt 3):
[2020-12-01T20:13:53Z] GaxiosError: Failed to create instance because the project or creator has reached the max instance per project/creator limit.
[2020-12-01T20:13:53Z] at Gaxios._request (/snapshot/build/node_modules/gaxios/src/gaxios.ts:117:15)
[2020-12-01T20:13:53Z] at runMicrotasks (<anonymous>)
[2020-12-01T20:13:53Z] at processTicksAndRejections (internal/process/task_queues.js:97:5)
[2020-12-01T20:13:53Z] at JWT.requestAsync (/snapshot/build/node_modules/google-auth-library/build/src/auth/oauth2client.js:343:18) {
[2020-12-01T20:13:53Z] response: [Object],
[2020-12-01T20:13:53Z] config: [Object],
[2020-12-01T20:13:53Z] code: 403,
[2020-12-01T20:13:53Z] errors: [Array]
[2020-12-01T20:13:53Z] }
Should not consume a high-level retry attempt:
[2020-12-02T21:57:59Z] 2020-12-02T21:57:59.729Z debug: aws sdk: [AWS iam 400 0.332s 0 retries] createRole({
[2020-12-02T21:57:59Z] AssumeRolePolicyDocument: '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":"arn:aws:iam::959325414060:role/bk-1294-f61-a-eks-nodes"},"Action":"sts:AssumeRole"}]}',
[2020-12-02T21:57:59Z] RoleName: 'bk-1294-f61-a-cert-manager'
[2020-12-02T21:57:59Z] })
[2020-12-02T21:57:59Z] 2020-12-02T21:57:59.734Z error: error during cluster creation (attempt 1):
[2020-12-02T21:57:59Z] MalformedPolicyDocument: Invalid principal in policy: "AWS":"arn:aws:iam::959325414060:role/bk-1294-f61-a-eks-nodes"
[2020-12-02T21:57:59Z] at Request.extractError (/snapshot/build/node_modules/aws-sdk/lib/protocol/query.js:50:29)
[2020-12-02T21:57:59Z] at Request.callListeners (/snapshot/build/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
[2020-12-02T21:57:59Z] at Request.emit (/snapshot/build/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
[2020-12-02T21:57:59Z] at Request.emit (/snapshot/build/node_modules/aws-sdk/lib/request.js:688:14)
[2020-12-02T21:57:59Z] at Request.transition (/snapshot/build/node_modules/aws-sdk/lib/request.js:22:10)
[2020-12-02T21:57:59Z] at AcceptorStateMachine.runTo (/snapshot/build/node_modules/aws-sdk/lib/state_machine.js:14:12)
[2020-12-02T21:57:59Z] at /snapshot/build/node_modules/aws-sdk/lib/state_machine.js:26:10
[2020-12-02T21:57:59Z] at Request.<anonymous> (/snapshot/build/node_modules/aws-sdk/lib/request.js:38:9)
[2020-12-02T21:57:59Z] at Request.<anonymous> (/snapshot/build/node_modules/aws-sdk/lib/request.js:690:12)
[2020-12-02T21:57:59Z] at Request.callListeners (/snapshot/build/node_modules/aws-sdk/lib/sequential_executor.js:116:18)
[2020-12-02T21:57:59Z] at Request.emit (/snapshot/build/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
[2020-12-02T21:57:59Z] at Request.emit (/snapshot/build/node_modules/aws-sdk/lib/request.js:688:14)
[2020-12-02T21:57:59Z] at Request.transition (/snapshot/build/node_modules/aws-sdk/lib/request.js:22:10)
[2020-12-02T21:57:59Z] at AcceptorStateMachine.runTo (/snapshot/build/node_modules/aws-sdk/lib/state_machine.js:14:12)
[2020-12-02T21:57:59Z] at /snapshot/build/node_modules/aws-sdk/lib/state_machine.js:26:10
[2020-12-02T21:57:59Z] at Request.<anonymous> (/snapshot/build/node_modules/aws-sdk/lib/request.js:38:9)
[2020-12-02T21:57:59Z] at Request.<anonymous> (/snapshot/build/node_modules/aws-sdk/lib/request.js:690:12)
[2020-12-02T21:57:59Z] at Request.callListeners (/snapshot/build/node_modules/aws-sdk/lib/sequential_executor.js:116:18)
[2020-12-02T21:57:59Z] at callNextListener (/snapshot/build/node_modules/aws-sdk/lib/sequential_executor.js:96:12)
[2020-12-02T21:57:59Z] at IncomingMessage.onEnd (/snapshot/build/node_modules/aws-sdk/lib/event_listeners.js:313:13)
[2020-12-02T21:57:59Z] at IncomingMessage.emit (events.js:327:22)
[2020-12-02T21:57:59Z] at IncomingMessage.EventEmitter.emit (domain.js:485:12)
[2020-12-02T21:57:59Z] at endReadableNT (_stream_readable.js:1224:12)
[2020-12-02T21:57:59Z] at processTicksAndRejections (internal/process/task_queues.js:84:21) {
[2020-12-02T21:57:59Z] code: 'MalformedPolicyDocument',
[2020-12-02T21:57:59Z] time: 2020-12-02T21:57:59.729Z,
[2020-12-02T21:57:59Z] requestId: '0487635b-a75f-4168-8a32-ccb65cae157b',
[2020-12-02T21:57:59Z] statusCode: 400,
[2020-12-02T21:57:59Z] retryable: false,
[2020-12-02T21:57:59Z] retryDelay: 1000
[2020-12-02T21:57:59Z] }
[2020-12-02T21:57:59Z] 2020-12-02T21:57:59.734Z error: JSON representation of err: {
[2020-12-02T21:57:59Z] "message": "Invalid principal in policy: \"AWS\":\"arn:aws:iam::959325414060:role/bk-1294-f61-a-eks-nodes\"",
[2020-12-02T21:57:59Z] "code": "MalformedPolicyDocument",
[2020-12-02T21:57:59Z] "time": "2020-12-02T21:57:59.729Z",
[2020-12-02T21:57:59Z] "requestId": "0487635b-a75f-4168-8a32-ccb65cae157b",
[2020-12-02T21:57:59Z] "statusCode": 400,
[2020-12-02T21:57:59Z] "retryable": false,
[2020-12-02T21:57:59Z] "retryDelay": 1000
[2020-12-02T21:57:59Z] }
Let's move the createRole()
call under the AWSResource control.
Well. This resolved itself through retrying, i.e. the underlying issue was eventual consistency within AWS.
Carrying over main findings from https://github.com/opstrace/opstrace-prelaunch/issues/1840.
@triclambert reported that the cluster never became healthy.
@sreis found that the controller was OOM-killed, 4 times: https://github.com/opstrace/opstrace-prelaunch/issues/1840#issuecomment-722553747
We did not root-cause this, and as far as I understand we have no reason to believe this is fixed -- situation may happen again.
Might relate to
https://github.com/opstrace/opstrace-prelaunch/issues/1089
https://github.com/opstrace/opstrace-prelaunch/issues/1089#issuecomment-668786259
kubernetes-client/javascript#494
So far, commit message linting happens voluntarily on in the dev's environment. CI does not enforce commit message linting rules. Let's change that.
Potentially follow a few redirects (don't do a login, though). Test for an expected piece of HTML. This covers a lot, and for example provides a strong signal with respect to issues like #72.
The Opstrace cluster name is used in contexts that impose limitations on length and character set.
Historically, that's how we ended up with current cluster name limitations.
Some of these limitations go back to infrastructure components that we don't use anymore, such as Bigtable. See this commit: b1ea161
// const infraNamePrefix = getInfrastructureName(stack.org, stack.name);
// return `${infraNamePrefix}-idx`;
// Note that the derived Bigtable instance cluster ID must not get longer
// than 30 chars.
return `${stack.name}-idxdb`;
};
TODO: play with this, see which length limitation is as of 'today' still justified. Then adjust the limit.
Carried over from opstrace/opstrace-prelaunch/issues/1457, Sep 29.
Having this code:
function* destroyClusterAttemptWithTimeout() {
log.debug("destroyClusterAttemptWithTimeout");
const { timeout } = yield race({
destroy: call(destroyClusterCore),
timeout: delay(DESTROY_ATTEMPT_TIMEOUT_SECONDS * SECOND)
});
if (timeout) {
// Note that in this case redux-saga guarantees to have cancelled the
// task(s) that lost the race, i.e. the `destroy` task above.
...
I've seen that redux-saga tasks spawned (well, fork()
ed) deep in the redux saga task history do not reliably get cancelled upon said timeout.
from docs (https://redux-saga.js.org/docs/advanced/TaskCancellation.html):
Besides manual cancellation there are cases where cancellation is triggered automatically
In a race effect. All race competitors, except the winner, are automatically cancelled.
also docs claim that cancellation works through hierarchy:
So we saw that Cancellation propagates downward (in contrast returned values and uncaught errors propagates upward).
Data:
[2020-09-28T21:14:36Z] })
[2020-09-28T21:14:36Z] 2020-09-28T21:14:36.143Z οΏ½[34mdebugοΏ½[39m: internet gateway teardown: cycle 598
[2020-09-28T21:14:36Z] 2020-09-28T21:14:36.184Z οΏ½[34mdebugοΏ½[39m: aws sdk: [AWS ec2 200 0.041s 0 retries]
[...]
[2020-09-28T21:14:36Z] 2020-09-28T21:14:36.184Z οΏ½[32minfoοΏ½[39m: internet gateway teardown: sleep 10.00 s
[...]
[2020-09-28T21:14:36Z] 2020-09-28T21:14:36.301Z οΏ½[34mdebugοΏ½[39m: internet gateway teardown: cycle 420
[...]
[2020-09-28T21:14:41Z] 2020-09-28T21:14:41.409Z οΏ½[31mwarningοΏ½[39m: cluster teardown attempt timed out after 2100 seconds
[2020-09-28T21:14:41Z] 2020-09-28T21:14:41.410Z οΏ½[32minfoοΏ½[39m: start attempt 4 in 30 s
[2020-09-28T21:14:45Z] 2020-09-28T21:14:45.574Z οΏ½[34mdebugοΏ½[39m: aws sdk: [AWS ec2 200 0.045s 0 retries] [...]
[2020-09-28T21:14:45Z] 2020-09-28T21:14:45.574Z οΏ½[34mdebugοΏ½[39m: internet gateway teardown: cycle 210
Seeing these internet gateway teardown cycle numbers makes it obvious that tasks spawned by one cluster teardown iteration survived even after that timed out.
Might be misusage of redux-saga, but in view of https://github.com/opstrace/opstrace-prelaunch/issues/1445 I think we might want to look into explicit self-controlled cancellation upon timeout.
From Nov 13:
Btw, this is present as ever, and whether or not this breaks a user's workflow depends on the context. It's certainly an architectural bug, and quite a messy state. Here, for example, a scenario where creation tasks overlap after a high-level timeout "aborted" the first attempt (well, it didn't, but a second high level attempt just added itself on top of the soup):
[2020-11-13T17:36:19Z] 2020-11-13T17:36:19.640Z info: EKS cluster status: CREATING
[2020-11-13T17:36:19Z] 2020-11-13T17:36:19.642Z info: EKS cluster setup: desired state not reached, sleep 10.00 s
[2020-11-13T17:36:24Z] 2020-11-13T17:36:24.704Z warning: cluster creation attempt timed out after 2400 seconds
[2020-11-13T17:36:24Z] 2020-11-13T17:36:24.705Z info: start attempt 2 in 10 s
[2020-11-13T17:36:29Z] 2020-11-13T17:36:29.643Z debug: EKS cluster setup: cycle 23
[2020-11-13T17:36:30Z] 2020-11-13T17:36:30.148Z debug: aws sdk: [AWS eks 200 0.505s 0 retries] describeCluster({ name: 'bk-2756-fb5-a' })
[2020-11-13T17:36:30Z] 2020-11-13T17:36:30.149Z info: EKS cluster status: CREATING
[2020-11-13T17:36:30Z] 2020-11-13T17:36:30.149Z info: EKS cluster setup: desired state not reached, sleep 10.00 s
[2020-11-13T17:36:34Z] 2020-11-13T17:36:34.705Z debug: createClusterAttemptWithTimeout
[2020-11-13T17:36:34Z] 2020-11-13T17:36:34.705Z info: validate controller config
...
addlicense
implements file discovery using a "pattern" concept -- however, that's not well documented, and the code is also a little dubious.
Also see
https://github.com/google/addlicense/blob/a0294312aa76d31c0bd7e49083d88a2a04d9b3d1/main.go
google/addlicense#38
and
#70
When running the destroy operation with auth tokens generated yesterday:
2020-11-27T16:21:09.528Z info: Try to delete policy matdev-eks-linked-service
2020-11-27T16:21:09.529Z info: Try to delete policy matdev-cortex-s3
2020-11-27T16:21:09.531Z info: Try to delete policy matdev-loki-s3
2020-11-27T16:21:09.532Z info: Try to delete policy matdev-externaldns
2020-11-27T16:21:10.306Z info: All policy-role attachments detached
2020-11-27T16:21:36.913Z error: error during cluster teardown (attempt 1):
Error: Request failed with status code 502
at createError (/snapshot/opstrace/node_modules/axios/lib/core/createError.js:16:15)
at settle (/snapshot/opstrace/node_modules/axios/lib/core/settle.js:17:12)
at IncomingMessage.handleStreamEnd (/snapshot/opstrace/node_modules/axios/lib/adapters/http.js:236:11)
at IncomingMessage.emit (events.js:327:22)
at IncomingMessage.EventEmitter.emit (domain.js:485:12)
at endReadableNT (_stream_readable.js:1224:12)
at processTicksAndRejections (internal/process/task_queues.js:84:21) {
config: [Object],
request: [ClientRequest],
response: [Object],
isAxiosError: true,
toJSON: [Function (anonymous)]
}
2020-11-27T16:21:36.913Z error: JSON representation of err: {
"message": "Request failed with status code 502",
"name": "Error",
"stack": "Error: Request failed with status code 502\n at createError (/snapshot/opstrace/node_modules/axios/lib/core/createError.js:16:15)\n at settle (/snapshot/opstrace/node_modules/axios/lib/core/settle.js:17:12)\n at IncomingMessage.handleStreamEnd (/snapshot/opstrace/node_modules/axios/lib/adapters/http.js:236:11)\n at IncomingMessage.emit (events.js:327:22)\n at IncomingMessage.EventEmitter.emit (domain.js:485:12)\n at endReadableNT (_stream_readable.js:1224:12)\n at processTicksAndRejections (internal/process/task_queues.js:84:21)",
"config": {
"url": "https://dns-api.opstrace.net/dns/",
"method": "delete",
"data": "{\"clustername\":\"matdev\"}",
"headers": {
"Accept": "application/json, text/plain, */*",
"authorization": "Bearer eyJh<snip>A",
"x-opstrace-id-token": "eyJhbG<snip>k8eAw",
"Content-Type": "application/json",
"User-Agent": "axios/0.19.2",
"Content-Length": 24
},
"transformRequest": [
null
],
"transformResponse": [
null
],
"timeout": 0,
"xsrfCookieName": "XSRF-TOKEN",
"xsrfHeaderName": "X-XSRF-TOKEN",
"maxContentLength": -1
}
}
2020-11-27T16:21:36.915Z info: start attempt 2 in 30 s
After deleting the tokens with rm id.jwt access.jwt
it prompted a new login in the CLI and succeeded. Would be nice to handle this auth failure automatically by deleting the local tokens and prompt for login.
This is to enhance robustness of the DNS setup procedure in the installer.
Context: https://github.com/opstrace/opstrace-prelaunch/issues/2042#issuecomment-731293725
We are comparing the main branch and the PR branch using git .
We can try using the github API to get a list of files that changed in PR.
If there are changes merged to main branch and the docs only PR branch is not up to date it can trigger unnecessary CI builds.
Example,
curl -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/opstrace/opstrace/pulls/92/files \
| jq '.[].filename'
And then check if there's only changes to docs files from there.
β opstrace git:(mat/deploy-ui-bits) β ./build/bin/opstrace destroy aws $OPSTRACE_CLUSTER_NAME --region us-west2
2020-11-27T04:50:59.012Z info: logging to file: opstrace_cli_destroy_20201127-045059Z.log
2020-11-27T04:50:59.013Z info: Discovered AWS credentials. Access key: AKIA...5LWX
2020-11-27T04:50:59.014Z info: About to destroy cluster matdev (aws).
Proceed? [y/N] y
2020-11-27T04:52:08.309Z error: error during cluster teardown (attempt 1):
UnknownEndpoint: UnknownEndpoint: Inaccessible host: `eks.us-west2.amazonaws.com'. This service may not be available in the `us-west2' region.
at throwIfAWSAPIError (/snapshot/opstrace/node_modules/@opstrace/aws/build/util.js:0)
at Object.awsPromErrFilter (/snapshot/opstrace/node_modules/@opstrace/aws/build/util.js:0)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at getCluster (/snapshot/opstrace/node_modules/@opstrace/aws/build/eks.js:0)
at Object.doesEKSClusterExist (/snapshot/opstrace/node_modules/@opstrace/aws/build/eks.js:0)
at getEKSKubeconfig (/snapshot/opstrace/node_modules/@opstrace/uninstaller/build/index.js:0) {
statusCode: undefined
}
2020-11-27T04:52:08.310Z error: JSON representation of err: {
"name": "UnknownEndpoint"
}
This is due to an invalid region, but we should at least report the region might be invalid asap to the user.
Got a 429 error from the DNS service in CI: https://github.com/opstrace/opstrace-prelaunch/issues/2035
No further detail in the log (no response body, in particular), so the error stayed mysterious for the moment. Improving that is now tracked in https://github.com/opstrace/opstrace/issues/2034.
I wondered why / how we would breach an HTTP request limit... then I went into the DNS service code and realized that
Then I remembered that we had been talking about that. Did a bit of digging, here I was suggesting to not use a 429 response for that, but a 400 response:
See https://github.com/opstrace/opstrace-prelaunch/issues/1552#issuecomment-713424784
Let's use 429 only for an actual HTTP request rate limit.
For quota/limits I suggest using 400 responses:
AWS VpcLimitExceeded example: #1459
AWS TooManyBuckets example: #1323
(there was no reply about that in #1552).
Still very much my opinion :).
[2020-12-02T10:51:56Z] 2020-12-02T10:51:56.070Z debug: aws-sdk-js request failed (attempt 0): AddressLimitExceeded: The maximum number of addresses has been reached. (retryable, according to sdk: false)
Have the controller copy the secret with the https secret to the tenant namespaces.
We introduced kubed to copy the http certificate over to the tenant namespaces to be used by the ingresses. But we can also do it in the controller.
The controller can copy objects across namespaces. This is an example of using a secret in the kube-system as the source of truth and then persisting it in the appllication namespace too https://github.com/opstrace/opstrace/pull/45/files#diff-f7820c94ae287fc0583ec55d49a886b718acfd1247ac7132d6fd03db040c2ef0R159
Describe the bug
UI throws errors and the javascript console shows
Error during service worker registration: DOMException: Failed to register a ServiceWorker for
scope ('https://sreis337.opstrace.io/') with script ('https://sreis337.opstrace.io/service-worker.js'):
An SSL certificate error occurred when fetching the script.
This is related to the use of self-signed certificates
To Reproduce
Create an Opstrace cluster with letsencrypt-staging cert issuer.
After an install, add a new tenant to the cluster config file, then run the create command again (this is the simplest way to reproduce by effectively "adding" a tenant). The new tenant will be created, but the cert is invalid for this tenant's domain. Cert is fine for original domains. @sreis seems like the cert isn't getting copied over to the new tenant's namespace or maybe we're not generating a new cert to cover this domain, now that we combine all subdomains into the same cert?
This cluster has letsencrypt-prod
certs and all other tenants created during install have valid certs.
Adding a tenant after the original install no longer get's a valid cert:
We should add the .tsx extension to ensure we add the license header to these files too.
The goal is that the entire TypeScript code base is linted (and passing!) using ESLint (and a consistent set of rules).
This is a bigger effort (sometimes requiring quite involved code changes) and we should probably decompose this:
Started doing that for the CLI:
opstrace/packages/cli/package.json
Line 19 in 1e42975
We certainly want to end up in a state where we do not have "lint": "echo done"
anymore in our code base :-) (example)
The rules (respected by ESLint in CI, and also by the ESLint extension in VS Code) live here: https://github.com/opstrace/opstrace/blob/main/.eslintrc.js (and can be adjusted).
Not sure if I am using good k8s terminology here.
When ProgressDeadlineExceeded
for the controller deployment then the installer will log that (after https://github.com/opstrace/opstrace-prelaunch/pull/2033): https://github.com/opstrace/opstrace-prelaunch/pull/2033#issuecomment-731609108.
That's nice, already much better than before. We can consider that this resolved https://github.com/opstrace/opstrace-prelaunch/issues/1208.
But we can and should do better in terms of showing reasons / specific errors. We could
A good criterion I think is when e.g. the image can't be found then show the imagepullbackoff error. Also see https://github.com/opstrace/opstrace-prelaunch/pull/2033#issuecomment-731630012.
I tried that using our Deployment
class but couldn't quite get to the EphemeralContainers objects -- will try again via the raw js k8s client lib. Also see https://github.com/opstrace/opstrace-prelaunch/pull/2033#issuecomment-731609108 for inspiration from kubectl.
I think I'd like to have this. Had a lot of success with that before, especially w.r.t. keeping diffs meaningful.
Quick resource dump.
"what she says": https://sembr.org/ :-)
By inserting line breaks at semantic boundaries, writers, editors, and other collaborators can make source text easier to work with, without affecting how itβs seen by readers.
From: prometheus-community/helm-charts#25
https://sembr.org/ makes a compelling case for this, and I'm inclined to agree.
https://rhodesmill.org/brandon/2012/one-sentence-per-line/
I agree that semantic line breaks / one sentence per line makes it easier to review changes.
This seems to implement that: https://github.com/JoshuaKGoldberg/sentences-per-line
Too noisy on info log:
2020-12-02T08:25:04.691Z info: aws-sdk-js request failed (attempt 0): InvalidDBInstanceState: Instance jpdev is already being deleted.
2020-12-02T08:25:04.692Z info: RDS Aurora instance teardown: tryDestroy(): ignore aws api error: InvalidDBInstanceState: Instance jpdev is already being deleted. (HTTP status code: 400)
2020-12-02T08:25:05.563Z info: RDS instance status: deleting
2020-12-02T08:25:06.494Z info: RDS cluster status: deleting
2020-12-02T08:25:16.482Z info: aws-sdk-js request failed (attempt 0): InvalidDBInstanceState: Instance jpdev is already being deleted.
2020-12-02T08:25:16.484Z info: RDS Aurora instance teardown: tryDestroy(): ignore aws api error: InvalidDBInstanceState: Instance jpdev is already being deleted. (HTTP status code: 400)
That is, this commit didn't quite do what it was supposed to do: cf44e1f
And this patch wasn't quite sufficient: aws/aws-sdk-js#3402 (does the SDK internally classify many non-retryable errors as retryable?)
[2020-11-20T12:35:23Z] 2020-11-20T12:35:23.255Z info: ServiceLinkedRole(elasticloadbalancing.amazonaws.com) setup: reached desired state, done (duration: 0.37 s)
[2020-11-20T12:35:23Z] 2020-11-20T12:35:23.255Z info: setting up DNS
[2020-11-20T12:35:23Z] 2020-11-20T12:35:23.859Z error: error during cluster creation (attempt 1):
[2020-11-20T12:35:23Z] Error: Request failed with status code 429
In the context of this we have to (debug-)log HTTP response details (such as the body (prefix)). Needed to debug things, generally.
Examples for what/how to log:
When launching a dev cluster using a CLI build not built by CI (e.g. when using the current tsc-built index.js
) then the default for controller_image
is usually pointing to an image that does not exist.
I then usually have picked the last controller image built and pushed by CI and added a corresponding controller_image: ...
value to the cluster config yaml document based on which I was trying to launch a cluster. That controller image reference requires a manual lookup (either from a buildkite build log, or from docker hub).
As an improvement for this dev workflow let's instead have an alias that wen can use when manually building up a cluster config file.
That is, let's have CI add a special tag to the controller image of the last passed CI run from main
. This make target is used by CI for that.
It's important to appreciate that this moving target / alias has its purpose for a local dev workflow only; i.e. where the ambiguity w.r.t. the actual controller image that you get is manually chosen.
Currently, a user cannot know about the shortcut keys that open the command pickers. We need to have a place where they can learn.
With each create
operation, the CLI generates fresh key material early in the process: an RSA key pair, and derived authentication tokens for the data API. It's doing that before doing any remote state inspection.
When the Opstrace cluster already existed before invoking the current create
operation, then the create
operation will push the new public key into the existing cluster. However, the deployments will not pick it up and subsequently, the CLI will use the new (bad) authentication tokens for probing cluster readiness in the last phase of said create
operation.
That will fail with something like
[2020-11-24T20:57:27Z] 2020-11-24T20:57:27.056Z info: https://loki-external.default.bk-2962-d71-a.opstrace.io:8443/loki/api/v1/labels: still waiting, unexpected HTTP response
[2020-11-24T20:57:27Z] 2020-11-24T20:57:27.543Z debug: HTTP response details:
[2020-11-24T20:57:27Z] status: 401
[2020-11-24T20:57:27Z] body[:500]: bad authentication token
This is a known limitation not yet documented; and a good reason to create this ticket.
This behavior also certainly a violation of an idempotency constraint that we talk about every now and then (where we think and want the create
operation to be idempotent).
For example, from the current quickstart documentation
So you know: The CLI is re-entrant, so if you kill it or it dies, it will pick up right where it left off.
This topic raises so many interesting points!
First-level thinking: we could detect when the k8s cluster & cluster-internal config (controller config) exists; and then do not regenerate key material and data api auth tokens; try to read existing authentication tokens (and fail when they can't be discovered).
Second-level thinking: the previous thought reveals a bigger picture insight: when the cluster already exists then we don't want to overwrite any part of the cluster-internal config state -- that is not well-specified yet.
Third-level thinking: ok, we can explicitly ignore the user-given config, emit a clear warning message, and move on with the Nth create
operation on the same cluster; trying to inform the admin via log messages that we just ignored the config they provided.
Fourth-level thinking: but this kind of fallback would be too magic -- we'd be ignoring the user-given config file (or parts of it!) w/o providing a clear signal. No, this must lead to non-zero exit of the current create
operation.
Fifth-level thinking: this means the same command run twice (unaltered) can't just magically do the right thing in an idempotent fashion. There's a logical conflict here. Even with proper config upgrade mechanism we should not magically switch between 'initial create' and 'config upgrade'.
What's the value of the idempotency constraint? I think we have to see that it is a little ignorant -- because we have to think about every aspect of the cluster configuration when discussing idempotency or "continuation" of a previous partial create
.
We could specify "well" what it means to overwrite the config (and apply all changes). That could resolve this conflict. Each create
operation could simply set the current config, including key material. That sounds like a big, interesting project for later. When done properly that does, for example, require to do key rotation in the API proxies w/o downtime.
But note: even with proper config upgrade mechanism we should not magically switch between 'initial create' and 'config upgrade'. That must be an explicit user choice.
We could make the the idempotency constraint a little weaker, and apply it only to the cloud infrastructure setup before doing k8s cluster interaction. I think this would be pragmatic and valuable! That is: implicit continuation, actual idempotency when we talk about cloud infra, not about the k8s cluster state itself.
How to distinguish those cases? We could, in the beginning of the create
operation, check if a corresponding k8s/EKS cluster already exists (not limiting the search to the region specified via config file, but across all regions).
Question: do we allow for the same Opstrace cluster name to be re-used across regions in the same cloud account? No. (for now, to have something to base decisions on -- but probably forever; that's just too confusing). (already used elsewhere, but need to write down)
Simply continue -- with key material regeneration :) -- and cloud infra creation (in an idempotent fashion, i.e. this will automatically pick up where previously left off).
Continue. There's no conflict here (again: idempotency).
Then we should abort, exit non-zero; saying that we don't yet have a config-diff-upgrade mechanism.
Then offer a continuation run with a CLI flag (e.g. --continue
) that will then move past this point.
This --continue
mode would not accept a cluster config, or does explicitly ignore various parts of the cluster configuration file; and requires successful discovery of authentication token files from a previous create
run. To me, that's another argument for: authentication token files should actually always live in a directory as an atomic unit; and we want to introduce a command line flag for discovering that.
I begin to see:
opstrace create aws jpdev --continue [--api-token-dir PATH]
--api-token-dir PATH
is for discovery in this case; and it has has a default, matching the corresponding default for writing these files (upon first, happy-path create
).--continue
for the second cmd invocation (so that one does not need to remove -c
or stdin?)Definite TODO: write authentication tokens to disk only when we are certain that we create new new cluster.
That means: delay writing the authentication tokens (compared to what we do today).
And here we are: Nth level thinking: this create --continue
is basically the status
command that we already have; or at least it's getting super close. I think: no, let's not do opstrace create aws jpdev --continue [--api-token-dir PATH]
-- let's re-think status, maybe call it differently.
opstrace wait aws jpdev [--api-token-dir PATH]
It will find the EKS cluster (look for it in all regions), get the list of tenants from the cluster, and then use the API tokens to do its thing.
This would also resolve my major 'design complaint' about the current status
implementation: it requires the config file, but shouldn't: https://github.com/opstrace/opstrace/blob/e44644d78f01659cf7d69ee44d6658a0f9119059/packages/cli/src/status.ts#L59
Other thoughts triggered by this discussion
https://buildkite.com/opstrace/prs/builds/3121#5648443f-a53b-4e7b-aef5-bd7715eceb2a/3715
checking cluster is using certificate issued by LetsEncrypt
[2020-12-02T21:36:04Z] + openssl x509 -noout -issuer
[2020-12-02T21:36:04Z] + openssl s_client -showcerts -connect system.bk-3121-df2-a.opstrace.io:443
[2020-12-02T21:36:04Z] + grep 'Fake LE Intermediate'
[2020-12-02T21:38:14Z] 140420438553728:error:0200206E:system library:connect:Connection timed out:../crypto/bio/b_sock2.c:110:
[2020-12-02T21:38:14Z] 140420438553728:error:2008A067:BIO routines:BIO_connect:connect error:../crypto/bio/b_sock2.c:111:
[2020-12-02T21:38:14Z] connect:errno=110
[2020-12-02T21:38:14Z] + teardown
[2020-12-02T21:38:14Z] + LAST_EXITCODE_BEFORE_TEARDOWN=1
Something like opstrace kubectl aws jpdev
?
input:
(same two parameters as for destroy)
For debuggability.
For replacing the make kconfig-*
make targets.
Can refer to that in CLI-emitted log msgs/error msgs.
A CLI build from main
from today.
2020-12-01T16:02:16.273Z debug: CLI build information: {
"BRANCH_NAME": "main",
"VERSION_STRING": "5b8ea45a-ci",
"COMMIT": "5b8ea45",
"BUILD_TIME_RFC3339": "2020-12-01 12:05:51+00:00",
"BUILD_HOSTNAME": "0d5430f98c0d"
}
$ ./opstrace create aws jpdev -c ~/dev/opstrace/ci/cluster-config.yaml
...
2020-12-01T16:02:16.277Z debug: user-given cluster config parsed. JSON representation:
{
"tenants": [
"default"
],
"env_label": "ci",
"node_count": 3
}
...
2020-12-01T16:22:27.526Z info: waiting for 3 StatefulSets
2020-12-01T16:22:27.526Z info: waiting for 2 Certificates
2020-12-01T16:22:27.526Z debug: Waiting for Certificate ingress/https-cert to be ready
2020-12-01T16:22:27.527Z debug: Waiting for Certificate ingress/kubed-apiserver-cert to be ready
...
2020-12-01T16:42:18.212Z info: waiting for 0 Deployments
2020-12-01T16:42:18.212Z info: waiting for 0 DaemonSets
2020-12-01T16:42:18.213Z info: waiting for 0 StatefulSets
2020-12-01T16:42:18.213Z info: waiting for 1 Certificates
2020-12-01T16:42:18.213Z debug: Waiting for Certificate ingress/https-cert to be ready
2020-12-01T16:42:19.127Z info: shutting down k8s informers
2020-12-01T16:42:19.129Z warning: cluster creation attempt timed out after 2400 seconds
2020-12-01T16:42:19.130Z info: start attempt 2 in 10 s
...
2020-12-01T17:16:00.171Z info: waiting for 0 Deployments
2020-12-01T17:16:00.172Z info: waiting for 0 DaemonSets
2020-12-01T17:16:00.172Z info: waiting for 0 StatefulSets
2020-12-01T17:16:00.172Z info: waiting for 1 Certificates
2020-12-01T17:16:00.172Z debug: Waiting for Certificate ingress/https-cert to be ready
@sreis I would appreciate if you can have a look here.
The cluster is marked ready but is using an invalid certificate.
See previous discussion.
Wait for certificate to be marked ready before proceeding with install.
Query an endpoint and check the certificate issuer.
[2020-12-02T12:37:00Z] 2020-12-02T12:37:00.972Z info: HTTP resp to GET(https://loki.system.bk-1290-878-a.opstrace.io/loki/api/v1/query_range?query=%7Bk8s_namespace_name%3D%22loki%22%2C+k8s_container_name%3D%22ingester%22%7D+%7C%3D+%22Starting+Loki%22&direction=BACKWARD®exp=&limit=10&start=1606552581861000000&end=1606916181861000000):
[2020-12-02T12:37:00Z] status: 502
[2020-12-02T12:37:00Z] body[:500]: EOF
[2020-12-02T12:37:00Z] headers: {"server":"openresty/1.15.8.2","date":"Wed, 02 Dec 2020 12:37:00 GMT","content-type":"text/plain; charset=utf-8","content-length":"3","connection":"close","strict-transport-security":"max-age=15724800; includeSubDomains"}
[2020-12-02T12:37:00Z] totalTime: 39.108 s
[2020-12-02T12:37:00Z] dnsDone->TCPconnectDone: 0.001 s
[2020-12-02T12:37:00Z] connectDone->reqSent 0 s
[2020-12-02T12:37:00Z] reqSent->firstResponseByte: 39.067 s
Describe the bug
The link to community discussions at the end of the Roadmap is not working.
(I assume because I'm not a member of the organization so maybe that is by intention.)
https://opstrace.com/docs/references/roadmap
The link leads to
https://go.opstrace.com/community
which redirects to
https://github.com/opstrace/opstrace/discussions
where I get a 404 response.
To Reproduce
Expected behavior
I expected to be redirected to a page where I can see ongoing community discussions or a note that this is intended only for members of the organization.
[2020-11-30T20:39:27Z] 2020-11-30T20:39:27.951Z info: HTTP resp to GET(https://loki.default.bk-1278-bfb-a.opstrace.io/loki/api/v1/query_range?query=%7Bdummystream%3D%22test-remote-ldi-1ZqwhA-0003%22%7D&direction=FORWARD&limit=20000&start=1606768759642000000&end=1606768759644000000):
[2020-11-30T20:39:27Z] status: 500
[2020-11-30T20:39:27Z] body[:500]: rpc error: code = Internal desc = received 252090-bytes data exceeding the limit 242040 bytes
[2020-11-30T20:39:27Z]
[2020-11-30T20:39:27Z] headers: {"server":"openresty/1.15.8.2","date":"Mon, 30 Nov 2020 20:39:27 GMT","content-type":"text/plain; charset=utf-8","content-length":"94","connection":"close","strict-transport-security":"max-age=15724800; includeSubDomains","x-content-type-options":"nosniff"}
[2020-11-30T20:39:27Z] totalTime: 0.258 s
[2020-11-30T20:39:27Z] dnsDone->TCPconnectDone: 0.003 s
[2020-11-30T20:39:27Z] connectDone->reqSent 0 s
[2020-11-30T20:39:27Z] reqSent->firstResponseByte: 0.215 s
[2020-11-30T20:39:27Z]
[2020-11-30T20:39:27Z] 1) long dummystream insert, validate via query
From prelaunch repo, Sep 2021.
Thinking far into the future, there might be two "cluster teardown" modes of interest:
To date, wen seem to have been focusing on (1) only; at least the current cluster destroy
operation is closer to (1) than to (2). Because (2) takes a whole lot of thinking & verification/testing work.
But we're not really consequential within doing (1) -- for example, we wait for the kubernetes deployments to cleanly terminate. Which should only be necessary in the context of (2).
Systematic / clear thinking in these lines, especially for (1), might save a lot of work today.
we wait for the kubernetes deployments to cleanly terminate
Btw, we do this for getting away with not tearing down disks/EBS volumes "manually".
Describe the bug
checking cluster is using certificate issued by LetsEncrypt
[2020-11-25T00:17:16Z] + openssl s_client -showcerts -connect system.bk-1240-2ac-g.opstrace.io:443
[2020-11-25T00:17:16Z] + grep 'Fake LE Intermediate'
[2020-11-25T00:17:16Z] + openssl x509 -noout -issuer
[2020-11-25T00:17:16Z] 140606379885696:error:2008F002:BIO routines:BIO_lookup_ex:system lib:../crypto/bio/b_addr.c:724:No address associated with hostname
[2020-11-25T00:17:16Z] connect:errno=0
[2020-11-25T00:17:16Z] + teardown
[2020-11-25T00:17:16Z] + LAST_EXITCODE_BEFORE_TEARDOWN=1
Seen here: https://buildkite.com/opstrace/scheduled-main-builds/builds/1240#2fc9a08e-bfc0-41f8-8992-fbee901b3631/988 (scheduled build from main)
To Reproduce
Flaky issue only seen, thus far, in GCP.
Expected behavior
Cluster should be made ready with the requested certificate.
This is a follow-up from comments/observations made previously:
ELBs created by entities running on a k8s cluster encode the k8s cluster name in a tag name:
kubernetes.io/cluster/<k8s-cluster-name>
Note that "k8s cluster name" is actually the GKE / EKS cluster name.
Using this tag might be the best bet for teardown to detect:
<opstrace-cluster-name>
This assumes that (only needs to be done if) we cannot detect (1) reliably by e.g. setting and reading opstrace_cluster_name
.
If we go down this path then we need to encode the fact that a k8s cluster (GKE / EKS cluster) belongs to an Opstrace cluster in the k8s cluster name, e.g. via prefix.
That means: while today, the k8s cluster name corresponds to the opstrace cluster name, in the future we might want to have k8s_cluster_name = "opstrace-${opstrace_cluster_name}"
.
This approach could replace the 'detect-elbs-belonging-to-opstrace-cluster-via-vpc-association-technique' (https://github.com/opstrace/opstrace-prelaunch/blob/39a5e869d171655268b40dac8330a54801f05683/lib/aws/src/vpc.ts#L14), and could also be applied for other resource types, such as persistent volumes!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.