Giter VIP home page Giter VIP logo

dremio-diagnostic-collector's People

Contributors

dependabot[bot] avatar markcurtis1970 avatar mxmarg avatar nleaman avatar rsvihladremio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dremio-diagnostic-collector's Issues

Replace glog

Glog spams the temp folder and is not very pretty, even with a logdir passed is lots of output for something so simple. Suggest using something like the following and removing glog from the project

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"os"
)

const (
	LevelError = iota
	LevelWarning
	LevelInfo
	LevelDebug
)

type Logger struct {
	Debug   *log.Logger
	Info    *log.Logger
	Warning *log.Logger
	Error   *log.Logger
}

func NewLogger(level int) *Logger {
	debugOut, infoOut, warningOut := ioutil.Discard, ioutil.Discard, ioutil.Discard

	switch level {
	case LevelDebug:
		debugOut = os.Stdout
		fallthrough
	case LevelInfo:
		infoOut = os.Stdout
		fallthrough
	case LevelWarning:
		warningOut = os.Stdout
	}

	return &Logger{
		Debug:   log.New(debugOut, "DEBUG: ", log.Ldate|log.Ltime|log.Lshortfile),
		Info:    log.New(infoOut, "INFO: ", log.Ldate|log.Ltime|log.Lshortfile),
		Warning: log.New(warningOut, "WARNING: ", log.Ldate|log.Ltime|log.Lshortfile),
		Error:   log.New(os.Stderr, "ERROR: ", log.Ldate|log.Ltime|log.Lshortfile),
	}
}

func (l *Logger) Debugf(format string, v ...interface{}) {
	l.Debug.Output(2, fmt.Sprintf(format, v...))
}

func (l *Logger) Infof(format string, v ...interface{}) {
	l.Info.Output(2, fmt.Sprintf(format, v...))
}

func (l *Logger) Warningf(format string, v ...interface{}) {
	l.Warning.Output(2, fmt.Sprintf(format, v...))
}

func (l *Logger) Errorf(format string, v ...interface{}) {
	l.Error.Output(2, fmt.Sprintf(format, v...))
}

func main() {
	logger := NewLogger(LevelDebug)

	logger.Debugf("This is a %s message", "debug")
	logger.Infof("This is an %s message", "information")
	logger.Warningf("This is a %s message", "warning")
	logger.Errorf("This is an %s message", "error")
}

Archives will fail on k8s "../" file paths

Often in K8s installs the conf directory is softlinked to /opt/dremio/data the path uses a ../ notation and can fail when zipping the archive. For example:

2022/10/13 18:46:44 args: /usr/local/bin/kubectl exec -n default -c dremio-executor dremio-executor-1 -- bash -c find /opt/dremio/conf/..data/ -maxdepth 3 -type f

These can cause a fatal error and the temp directory is purged with no archive being created. We should still log this error but continue

Windows based collection uses wrong seperator notation in find command

Saw this when working with a customer running on windows today

2022/10/17 17:50:20 capture.go:74: ERROR: host default.dremio-master-0 unable to find files in directory \opt\dremio\conf\..data with error file search failed failed due to error unable to start command 'kubectl exec -n default -c dremio-master-coordinator dremio-master-0 -- find \opt\dremio\conf\..data/ -maxdepth 3 -type f' due to error 'exit status 1'
2022/10/17 17:50:20 args: kubectl exec -n default -c dremio-master-coordinator dremio-master-0 -- find \var\log\dremio/ -maxdepth 3 -type f -mtime -5
find: ‘\\opt\\dremio\\conf\\..data/’: No such file or directory
command terminated with exit code 1

AWSE logs get bundled under coordinator IPs by default

The AWSE deployment typically aerchives logs for all coordinators and executors under this kind of structure

$ tree -d /var/dremio_efs/
/var/dremio_efs/
├── log
│   ├── coordinator
│   │   ├── archive
│   │   ├── json
│   │   │   └── archive
│   │   └── preview
│   │       ├── archive
│   │       └── json
│   │           └── archive
│   └── executor
│       └── ip-10-10-10-176.eu-west-1.compute.internal
│           ├── archive
│           └── json
│               └── archive
└── thirdparty

When we collect logs for the default healthcheck format, we put all logs under the ../logs/IP-C/.. directory. This means any executor logs will be clobbered by others with the same name and have no path identification

We either must handle this by adding some identifiers in the path, or marshalling it to the right executor path if we detect the directory structure above

Add target directories for AWSE preview engine logs

When we pull back logs we create the dir structure of the conf and logs - when testing on AWSE we skip the preview engine dirs because we havent yet created them:

2022/11/04 14:46:47 capture.go:368: ERROR: unable to copy /etc/dremio/preview/logback-admin.xml from host 34.240.72.72 due to error unable to start command 'scp -i /Users/mc/Support/mc-ssh-ireland.cer -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no [email protected]:/etc/dremio/preview/logback-admin.xml /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/1265787823/coordinators/34.240.72.72/conf/preview/logback-admin.xml' due to error 'exit status 1'

support dynamically finding gc log on JDK 11+

https://dzone.com/articles/disruptive-changes-to-gc-logging-in-java-9

-Xloggc:mygc.log is deprecated and there is a newer option available which ddc does not support finding the newer one which is -Xlog:gc:file=mygc.log

just needs to edit the following file (with some tests) to handle both formats since both will be supported for awhile

func ParseGCLogFromFlags(startupFlagsStr string) (gcLogLocation string, err error) {
tokens := strings.Split(startupFlagsStr, " ")
var found []int
for i, token := range tokens {
if strings.HasPrefix(token, "-Xloggc:") {
found = append(found, i)
}
}
if len(found) == 0 {
return "", nil
}
lastIndex := found[len(found)-1]
last := tokens[lastIndex]
gcLogLocationTokens := strings.Split(last, "-Xloggc:")
if len(gcLogLocationTokens) != 2 {
return "", fmt.Errorf("unexpected items in string '%v', expected only 2 items but found %v", last, len(gcLogLocationTokens))
}
return gcLogLocationTokens[1], nil

Archiving fails silently on unknown suffix

We silently fail to create the archive if we pass in an unknown extension such as .tar.gz. or .blah the code in collector.go:archiveDiagDirectory is not dealing with the unknown extension and silently fails.

We should clean up any suffix spaces and also inform the user if they try to use an unrecognised suffix, perhaps defaulting to .zip

Block wildcard searches

In some cases the GC log search will not detect the GC log from the JVM flag and therefore will return a wildcard. We should block wildcards as the base search directory to prevent the tool from globbing all files on the pods and bundling into the diag tarball

when archiving directory we hit an error

2022/07/28 16:33:59 unexpected error running collection 'unable to write tar file /Users/ryan.svihla/Downloads/diag-with-gc.tgz due to error read /var/folders/4q/s6vlfh4d5hnclb0kzjlkdgjc0000gp/T/3870758071/executors/default.dremio-executor-0/log/json: is a directory'

2022/07/28 16:35:58 unexpected error running collection 'unable to write zip file /Users/ryan.svihla/Downloads/diag-with-gc.zip due to error read /var/folders/4q/s6vlfh4d5hnclb0kzjlkdgjc0000gp/T/4098973776/executors/default.dremio-executor-1/log/json: is a directory'

capture linux configuration

would be handle to capture the linux information

  1. /var/log/system.log
  2. /etc/fstab
  3. /etc/security/limits.conf
  4. /etc/security/limits.d/*

EKS - collection for coordinator looks for "dremio-master-0"

With the following cluster

% kubectl get pods -n default --show-labels
NAME                READY   STATUS    RESTARTS   AGE   LABELS
dremio-executor-0   1/1     Running   0          21h   app=dremio-executor,controller-revision-hash=dremio-executor-7678d5f9b9,role=dremio-cluster-pod,statefulset.kubernetes.io/pod-name=dremio-executor-0
dremio-executor-1   1/1     Running   0          21h   app=dremio-executor,controller-revision-hash=dremio-executor-7678d5f9b9,role=dremio-cluster-pod,statefulset.kubernetes.io/pod-name=dremio-executor-1
dremio-master-0     1/1     Running   0          20h   app=dremio-coordinator,controller-revision-hash=dremio-master-84db9ccfd7,role=dremio-cluster-pod,statefulset.kubernetes.io/pod-name=dremio-master-0
zk-0                1/1     Running   0          22h   app=zk,controller-revision-hash=zk-5d9758fddd,statefulset.kubernetes.io/pod-name=zk-0
zk-1                1/1     Running   0          22h   app=zk,controller-revision-hash=zk-5d9758fddd,statefulset.kubernetes.io/pod-name=zk-1
zk-2                1/1     Running   0          22h   app=zk,controller-revision-hash=zk-5d9758fddd,statefulset.kubernetes.io/pod-name=zk-2

We run the following command

./bin/ddc --k8s --coordinator default:app=dremio-coordinator --executors default:app=dremio-executor --output ./eks-diag.tar.gz

The following output is seen

ddc v0.1.0-7375a13

2022/07/19 10:50:36 using Kubernetes kubectl based collection
2022/07/19 10:50:36 args kubectl get pods -n default -l app=dremio-coordinator -o name
2022/07/19 10:50:38 pod/dremio-master-0
2022/07/19 10:50:38 dremio-master-0
2022/07/19 10:50:38 args kubectl get pods -n default -l app=dremio-executor -o name
2022/07/19 10:50:38 args kubectl exec -it -n default -c dremio-coordinator dremio-master-0 -- iostat -y -x -d -c -t 1 60
2022/07/19 10:50:38 pod/dremio-executor-0
2022/07/19 10:50:38 dremio-executor-0
2022/07/19 10:50:38 pod/dremio-executor-1
2022/07/19 10:50:38 dremio-executor-1
2022/07/19 10:50:38 args kubectl exec -it -n default -c dremio-executor dremio-executor-1 -- iostat -y -x -d -c -t 1 60
2022/07/19 10:50:38 args kubectl exec -it -n default -c dremio-executor dremio-executor-0 -- iostat -y -x -d -c -t 1 60
2022/07/19 10:50:38 capture.go:103: ERROR: host default.dremio-master-0 failed iostat with error unable to start command 'kubectl exec -it -n default -c dremio-coordinator dremio-master-0 -- iostat -y -x -d -c -t 1 60' due to error 'exit status 1'
2022/07/19 10:50:38 args kubectl exec -it -n default -c dremio-coordinator dremio-master-0 -- ls -1 /opt/dremio/conf/..data
2022/07/19 10:50:39 capture.go:62: ERROR: host default.dremio-master-0 unable to find files in directory /opt/dremio/conf/..data with error ls -l failed due to error unable to start command 'kubectl exec -it -n default -c dremio-coordinator dremio-master-0 -- ls -1 /opt/dremio/conf/..data' due to error 'exit status 1'
2022/07/19 10:50:39 args kubectl exec -it -n default -c dremio-coordinator dremio-master-0 -- ls -1 /var/log/dremio
2022/07/19 10:50:39 capture.go:103: ERROR: host default.dremio-executor-0 failed iostat with error unable to start command 'kubectl exec -it -n default -c dremio-executor dremio-executor-0 -- iostat -y -x -d -c -t 1 60' due to error 'exit status 126'
2022/07/19 10:50:39 args kubectl exec -it -n default -c dremio-executor dremio-executor-0 -- ls -1 /opt/dremio/conf/..data
2022/07/19 10:50:39 capture.go:103: ERROR: host default.dremio-executor-1 failed iostat with error unable to start command 'kubectl exec -it -n default -c dremio-executor dremio-executor-1 -- iostat -y -x -d -c -t 1 60' due to error 'exit status 126'
2022/07/19 10:50:39 args kubectl exec -it -n default -c dremio-executor dremio-executor-1 -- ls -1 /opt/dremio/conf/..data
2022/07/19 10:50:39 capture.go:76: ERROR: host default.dremio-master-0 unable to find files in directory /var/log/dremio with error ls -l failed due to error unable to start command 'kubectl exec -it -n default -c dremio-coordinator dremio-master-0 -- ls -1 /var/log/dremio' due to error 'exit status 1'
2022/07/19 10:50:39 args kubectl cp -n default -c dremio-executor dremio-executor-1:/opt/dremio/conf/..data/core-site.xml /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/core-site.xml
2022/07/19 10:50:39 args kubectl cp -n default -c dremio-executor dremio-executor-0:/opt/dremio/conf/..data/core-site.xml /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/core-site.xml
2022/07/19 10:50:40 capture.go:188: INFO: host default.dremio-executor-1 copied /opt/dremio/conf/..data/core-site.xml to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/core-site.xml
2022/07/19 10:50:40 args kubectl cp -n default -c dremio-executor dremio-executor-1:/opt/dremio/conf/..data/dremio-env /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/dremio-env
2022/07/19 10:50:40 capture.go:188: INFO: host default.dremio-executor-0 copied /opt/dremio/conf/..data/core-site.xml to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/core-site.xml
2022/07/19 10:50:40 args kubectl cp -n default -c dremio-executor dremio-executor-0:/opt/dremio/conf/..data/dremio-env /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/dremio-env
2022/07/19 10:50:40 capture.go:188: INFO: host default.dremio-executor-1 copied /opt/dremio/conf/..data/dremio-env to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/dremio-env
2022/07/19 10:50:40 args kubectl cp -n default -c dremio-executor dremio-executor-1:/opt/dremio/conf/..data/dremio.conf /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/dremio.conf
2022/07/19 10:50:40 capture.go:188: INFO: host default.dremio-executor-0 copied /opt/dremio/conf/..data/dremio-env to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/dremio-env
2022/07/19 10:50:40 args kubectl cp -n default -c dremio-executor dremio-executor-0:/opt/dremio/conf/..data/dremio.conf /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/dremio.conf
2022/07/19 10:50:41 capture.go:188: INFO: host default.dremio-executor-1 copied /opt/dremio/conf/..data/dremio.conf to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/dremio.conf
2022/07/19 10:50:41 args kubectl cp -n default -c dremio-executor dremio-executor-1:/opt/dremio/conf/..data/logback-access.xml /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/logback-access.xml
2022/07/19 10:50:41 capture.go:188: INFO: host default.dremio-executor-0 copied /opt/dremio/conf/..data/dremio.conf to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/dremio.conf
2022/07/19 10:50:41 args kubectl cp -n default -c dremio-executor dremio-executor-0:/opt/dremio/conf/..data/logback-access.xml /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/logback-access.xml
2022/07/19 10:50:42 capture.go:188: INFO: host default.dremio-executor-0 copied /opt/dremio/conf/..data/logback-access.xml to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/logback-access.xml
2022/07/19 10:50:42 args kubectl cp -n default -c dremio-executor dremio-executor-0:/opt/dremio/conf/..data/logback-admin.xml /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/logback-admin.xml
2022/07/19 10:50:42 capture.go:188: INFO: host default.dremio-executor-1 copied /opt/dremio/conf/..data/logback-access.xml to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/logback-access.xml
2022/07/19 10:50:42 args kubectl cp -n default -c dremio-executor dremio-executor-1:/opt/dremio/conf/..data/logback-admin.xml /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/logback-admin.xml
2022/07/19 10:50:42 capture.go:188: INFO: host default.dremio-executor-0 copied /opt/dremio/conf/..data/logback-admin.xml to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/logback-admin.xml
2022/07/19 10:50:42 args kubectl cp -n default -c dremio-executor dremio-executor-0:/opt/dremio/conf/..data/logback.xml /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/logback.xml
2022/07/19 10:50:42 capture.go:188: INFO: host default.dremio-executor-1 copied /opt/dremio/conf/..data/logback-admin.xml to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/logback-admin.xml
2022/07/19 10:50:42 args kubectl cp -n default -c dremio-executor dremio-executor-1:/opt/dremio/conf/..data/logback.xml /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/logback.xml
2022/07/19 10:50:43 capture.go:188: INFO: host default.dremio-executor-1 copied /opt/dremio/conf/..data/logback.xml to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/logback.xml
2022/07/19 10:50:43 args kubectl exec -it -n default -c dremio-executor dremio-executor-1 -- ls -1 /var/log/dremio
2022/07/19 10:50:43 capture.go:188: INFO: host default.dremio-executor-0 copied /opt/dremio/conf/..data/logback.xml to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/logback.xml
2022/07/19 10:50:43 args kubectl exec -it -n default -c dremio-executor dremio-executor-0 -- ls -1 /var/log/dremio
2022/07/19 10:50:43 capture.go:78: INFO: host default.dremio-executor-1 finished finding files to copy out of the log directory
2022/07/19 10:50:43 args kubectl cp -n default -c dremio-executor dremio-executor-1:/var/log/dremio/server.out /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/log/server.out
2022/07/19 10:50:43 capture.go:78: INFO: host default.dremio-executor-0 finished finding files to copy out of the log directory
2022/07/19 10:50:43 args kubectl cp -n default -c dremio-executor dremio-executor-0:/var/log/dremio/server.out /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/log/server.out
2022/07/19 10:50:44 capture.go:188: INFO: host default.dremio-executor-1 copied /var/log/dremio/server.out to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/log/server.out
2022/07/19 10:50:44 capture.go:188: INFO: host default.dremio-executor-0 copied /var/log/dremio/server.out to /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/log/server.out
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/core-site.xml
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/dremio-env
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/dremio.conf
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/logback-access.xml
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/logback-admin.xml
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/conf/logback.xml
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-1/log/server.out
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/core-site.xml
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/dremio-env
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/dremio.conf
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/logback-access.xml
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/logback-admin.xml
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/conf/logback.xml
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/executors/default.dremio-executor-0/log/server.out
2022/07/19 10:50:44 taring file /var/folders/mn/qjh1d6d52hv5k5x8274lz46m0000gn/T/3228627356/summary.json
2022/07/19 10:50:44 gzipping file eks-diag.tar.tar into eks-diag.tar.gz

Here's where the problem is happening

2022/07/19 10:50:39 args kubectl exec -it -n default -c dremio-coordinator dremio-master-0 -- ls -1 /var/log/dremio

It should be

% kubectl exec -it -n default -c dremio-master-coordinator dremio-master-0 -- ls -1 /var/log/dremio
server.out

The container defaults to dremio-master-coordinator

EKS - dates returned from files are all 0 epoch

The dates returned by the collector for files from the pods are all 0 epoch

% ls -lR coordinators
total 0
drwxr-xr-x  4 mc  staff  128 19 Jul 11:08 default.dremio-master-0

coordinators/default.dremio-master-0:
total 0
drwxr-xr-x  8 mc  staff  256 19 Jul 11:08 conf
drwxr-xr-x  3 mc  staff   96 19 Jul 11:08 log

coordinators/default.dremio-master-0/conf:
total 64
-rw-------  1 mc  staff   167  1 Jan  1970 core-site.xml
-rw-------  1 mc  staff  2471  1 Jan  1970 dremio-env
-rw-------  1 mc  staff  1421  1 Jan  1970 dremio.conf
-rw-------  1 mc  staff  1818  1 Jan  1970 logback-access.xml
-rw-------  1 mc  staff  2050  1 Jan  1970 logback-admin.xml
-rw-------  1 mc  staff  8849  1 Jan  1970 logback.xml

coordinators/default.dremio-master-0/log:
total 8
-rw-------  1 mc  staff  901  1 Jan  1970 server.out

On the pod

 % kubectl exec dremio-master-0 -- ls -l /opt/dremio/conf
Defaulted container "dremio-master-coordinator" out of: dremio-master-coordinator, start-only-one-dremio-master (init), wait-for-zookeeper (init), chown-data-directory (init), upgrade-task (init)
total 0
lrwxrwxrwx 1 root root 20 Jul 18 13:36 core-site.xml -> ..data/core-site.xml
lrwxrwxrwx 1 root root 17 Jul 18 13:36 dremio-env -> ..data/dremio-env
lrwxrwxrwx 1 root root 18 Jul 18 13:36 dremio.conf -> ..data/dremio.conf
lrwxrwxrwx 1 root root 25 Jul 18 13:36 logback-access.xml -> ..data/logback-access.xml
lrwxrwxrwx 1 root root 24 Jul 18 13:36 logback-admin.xml -> ..data/logback-admin.xml
lrwxrwxrwx 1 root root 18 Jul 18 13:36 logback.xml -> ..data/logback.xml

when dremio collector collects nothing, write no diag file and suggest the next steps

This should be straightforward

  • count total hosts found
  • count total hosts connected

Scenarios

local-collect tests

there are current not a lot in the way of tests, to get out of beta we need to have this command fully covered.

Just read the code coverage, aim for 85%

consent is backwards

ddc local-collect should stop and show consent unless --accept-collection-consent is set..however the inverse happens, fix and write a test

Permission denied on find command aborts subsequent file copy

Often when running a find command a user will hit a permission denied problem. Currently this causes the collector to exit prematurely and not have files that were found considered for copying (which is what should happen)

Example

find: ‘/opt/dremio/data/lost+found’: Permission denied
command terminated with exit code 1
2022/11/10 09:56:30 capture.go:116: INFO: host default.dremio-master-0 finished finding files to copy out of the log directory
2022/11/10 09:56:30 capture.go:114: ERROR: host default.dremio-executor-0 unable to find files in directory /opt/dremio/data with error file search failed failed due to error unable to start command 'kubectl exec -n default -c dremio-executor dremio-executor-0 -- find /opt/dremio/data/ -maxdepth 3 -type f -mtime -1' due to error 'exit status 1'

dynamically find the gc.log

especially on kubernetes this can be in a variety of places and so we should make some special logic for reading the gc flag and getting it and any files with the same prefix at that location

Add command used into summary.json

It would be useful to see the command the user executed in the summary.json file to see why some pods/nodes might have been missed on collection

Consent formatting is bad (forgot the tabs for the item list)

	Dremio Data Collection Consent Form

	Introduction

	Dremio ("we", "us", "our") requests your consent to collect and use certain data files from your device for the purposes of diagnostics. We take your privacy seriously and will only use these files to improve our services and troubleshoot any issues you may be experiencing. 

	Data Collection and Use

	We would like to collect the following files from your device:
	* the following system tables: \"tables\",boot,fragments,jobs,materializations,membership,memory,nodes,options,privileges,reflection_dependencies,reflections,refreshes,roles,services,slicing_threads,table_statistics,threads,version,views,cache.datasets,cache.mount_points,cache.objects,cache.storage_plugins
* df -h output
* 25000 job profiles randomly selected
* queries.json files
* dremio-env, dremio.conf, logback.xml, and logback-access.xml
* usage statistics on the internal Key Value Store (KVStore)
* list of all sources, their type and name
* server.log including any archived versions, and server.out
* dremio-env, dremio.conf, logback.xml, and logback-access.xml
* reflection.log including archived versions
* access.log including archived versions
* all gc.log files produced by dremio
* Work Load Manager queue names and rule names
* Java thread dumps collected via jstack


	Please note that the files we collect may contain confidential data. We will minimize the collection of confidential data wherever possible and will anonymize the data where feasible. 

We will use these files to:

1. Identify and diagnose problems with our products or services that you are using.
2. Improve our products and services.
3. Carry out other purposes that we will disclose to you at the time we collect the files.

Consent

By clicking "I Agree", you grant us permission to access, collect, store, and use the files listed above from your device for the purposes outlined.

Withdrawal of Consent

You have the right to withdraw your consent at any time. If you wish to do so, please contact us at [email protected]. Upon receipt of your withdrawal request, we will stop collecting new files and will delete any files we have already collected, unless we are required by law to retain them.

Changes to this Consent Form

We reserve the right to update this consent form from time to time.

By running ddc with the --accept-collection-consent flag, you acknowledge that you have read, understood, and agree to the data collection practices described in this consent form.

remove k8s collect from local-collect

while this seems handy on paper the reality is this doesn't make a ton of sense from the local node unless kubectl has been installed. while we can include the k8s api and client, this could complicate things greatly, so for now we need to remove all references about k8s from local-collect until we have decided on a good support and versioning strategy.

Primary concerns

  • flags referencing k8s
  • documentation or code mentioning k8s

collect kubernetes configuration when possible

  • collect all describe pod and get pod -o yaml and get pod -o yaml json
  • collect controlling statefulset get pod -o yaml, get pod -o json and describe pod
  • if possible using defaults of helm chart attempt to collect all the known items configured by helm chart

recursively search all log and conf directories

atm ddc just lazily gets the top level log and config directories

can do this on each file, but it will slow collection pretty substantially

kubectl exec -it -n default -c dremio-executor dremio-executor-1 -- bash -c 'stat --printf=%F /var/log/dremio'
directory

Support for scale out coordinators in K8s

Currently we only support the one coordinator. To get logs from a scale out coordinator in K8s we'd need a unique pod label and be able to add it as a argument under the collection

Ideally we need to just have a list of labels and then collect everything from a list of labels and IPs and delete if the node's role from config or some other means

limit log collection by age

optionally limit log collection by days old

--only-collect-logs-newer-than 5d
--only-collect-logs-newer-than 2h

This should probably be based on file modified time and has a dependency on #9 being done

capture heap dump

This is purely optional and should come with a warning about size. Perhaps pushing into a separate tarball?

tmp dir might run out of space if partition is too small

By default we ask the OS for a tmp dir which is mostly fine but in some cases if the user has too small a tmp filesystem or they have too much space already used then it is quite possible to run out of space.

We need to add a command line flag to override this if needed. For now though the use can collect one executor at a time - both -c and -e must be given but it doesnt have to be all valid IPs / labels

Copy Linux version of ddc to all remote notes and then execute , capture results

This involves several major fixes at once

  • Remove existing capture and collect code for ddc
  • Check if there is an existing install of ddc on the remote node
  • if remote node has ddc, we need to check if version matches current
  • Download linux version of ddc if not present on remote node, or not up to date
  • keep the Kubernetes capture in place
  • tar each remote tar.gz into a local directory (but in a thread safe way)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.