Giter VIP home page Giter VIP logo

collector's Introduction

Table of Contents


StackRox Kubernetes Security Platform

The StackRox Kubernetes Security Platform performs a risk analysis of the container environment, delivers visibility and runtime alerts, and provides recommendations to proactively improve security by hardening the environment. StackRox integrates with every stage of container lifecycle: build, deploy and runtime.

The StackRox Kubernetes Security platform is built on the foundation of the product formerly known as Prevent, which itself was called Mitigate and Apollo. You may find references to these previous names in code or documentation.


Community

You can reach out to us through Slack (#stackrox). For alternative ways, stop by our Community Hub stackrox.io.

For event updates, blogs and other resources follow the StackRox community site at stackrox.io.

For the StackRox Code of Conduct.

To report a vulnerability or bug.


Deploying StackRox

Quick Installation using Helm

StackRox offers quick installation via Helm Charts. Follow the Helm Installation Guide to get helm CLI on your system. Then run the helm quick installation script or proceed to section Manual Installation using Helm for configuration options.

Install StackRox via Helm Installation Script
/bin/bash <(curl -fsSL https://raw.githubusercontent.com/stackrox/stackrox/master/scripts/quick-helm-install.sh)

A default deployment of StackRox has certain CPU and memory requests and may fail on small (e.g. development) clusters if sufficient resources are not available. You may use the --small command-line option in order to install StackRox on smaller clusters with limited resources. Using this option is not recommended for production deployments.

/bin/bash <(curl -fsSL https://raw.githubusercontent.com/stackrox/stackrox/master/scripts/quick-helm-install.sh) --small

The script adds the StackRox helm repository, generates an admin password, installs stackrox-central-services, creates an init bundle for provisioning stackrox-secured-cluster-services, and finally installs stackrox-secured-cluster-services on the same cluster.

Finally, the script will automatically open the browser and log you into StackRox. A certificate warning may be displayed since the certificate is self-signed. See the Accessing the StackRox User Interface (UI) section to read more about the warnings. After authenticating you can access the dashboard using https://localhost:8000/main/dashboard.

Manual Installation using Helm

StackRox offers quick installation via Helm Charts. Follow the Helm Installation Guide to get the helm CLI on your system.

Deploying using Helm consists of 4 steps

  1. Add the StackRox repository to Helm
  2. Launch StackRox Central Services using helm
  3. Create a cluster configuration and a service identity (init bundle)
  4. Deploy the StackRox Secured Cluster Services using that configuration and those credentials (this step can be done multiple times to add more clusters to the StackRox Central Service)
Install StackRox Central Services

Default Central Installation

First, the StackRox Central Services will be added to your Kubernetes cluster. This includes the UI and Scanner. To start, add the stackrox/helm-charts/opensource repository to Helm.

helm repo add stackrox https://raw.githubusercontent.com/stackrox/helm-charts/main/opensource/

To see all available Helm charts in the repo run (you may add the option --devel to show non-release builds as well)

helm search repo stackrox

To install stackrox-central-services, you will need a secure password. This password will be needed later for UI login and when creating an init bundle.

STACKROX_ADMIN_PASSWORD="$(openssl rand -base64 20 | tr -d '/=+')"

From here, you can install stackrox-central-services to get Central and Scanner components deployed on your cluster. Note that you need only one deployed instance of stackrox-central-services even if you plan to secure multiple clusters.

helm upgrade --install -n stackrox --create-namespace stackrox-central-services \
  stackrox/stackrox-central-services \
  --set central.adminPassword.value="${STACKROX_ADMIN_PASSWORD}"

Install Central in Clusters With Limited Resources

If you're deploying StackRox on nodes with limited resources such as a local development cluster, run the following command to reduce StackRox resource requirements. Keep in mind that these reduced resource settings are not suited for a production setup.

helm upgrade -n stackrox stackrox-central-services stackrox/stackrox-central-services \
  --set central.resources.requests.memory=1Gi \
  --set central.resources.requests.cpu=1 \
  --set central.resources.limits.memory=4Gi \
  --set central.resources.limits.cpu=1 \
  --set central.db.resources.requests.memory=1Gi \
  --set central.db.resources.requests.cpu=500m \
  --set central.db.resources.limits.memory=4Gi \
  --set central.db.resources.limits.cpu=1 \
  --set scanner.autoscaling.disable=true \
  --set scanner.replicas=1 \
  --set scanner.resources.requests.memory=500Mi \
  --set scanner.resources.requests.cpu=500m \
  --set scanner.resources.limits.memory=2500Mi \
  --set scanner.resources.limits.cpu=2000m
Install StackRox Secured Cluster Services

Default Secured Cluster Installation

Next, the secured cluster component will need to be deployed to collect information on from the Kubernetes nodes.

Generate an init bundle containing initialization secrets. The init bundle will be saved in stackrox-init-bundle.yaml, and you will use it to provision secured clusters as shown below.

kubectl -n stackrox exec deploy/central -- roxctl --insecure-skip-tls-verify \
  --password "${STACKROX_ADMIN_PASSWORD}" \
  central init-bundles generate stackrox-init-bundle --output - > stackrox-init-bundle.yaml

Set a meaningful cluster name for your secured cluster in the CLUSTER_NAME shell variable. The cluster will be identified by this name in the clusters list of the StackRox UI.

CLUSTER_NAME="my-secured-cluster"

Then install stackrox-secured-cluster-services (with the init bundle you generated earlier) using this command:

helm upgrade --install --create-namespace -n stackrox stackrox-secured-cluster-services stackrox/stackrox-secured-cluster-services \
  -f simon-test-cluster-init-bundle.yaml \
  --set clusterName="$CLUSTER_NAME" \
  --set centralEndpoint="central.stackrox.svc:443"

When deploying stackrox-secured-cluster-services on a different cluster than the one where stackrox-central-services is deployed, you will also need to specify the endpoint (address and port number) of Central via --set centralEndpoint=<endpoint_of_central_service> command-line argument.

Install Secured Cluster with Limited Resources

When deploying StackRox Secured Cluster Services on a small node, you can install with additional options. This should reduce stackrox-secured-cluster-services resource requirements. Keep in mind that these reduced resource settings are not recommended for a production setup.

helm install -n stackrox stackrox-secured-cluster-services stackrox/stackrox-secured-cluster-services \
  -f stackrox-init-bundle.yaml \
  --set clusterName="$CLUSTER_NAME" \
  --set centralEndpoint="central.stackrox.svc:443" \
  --set sensor.resources.requests.memory=500Mi \
  --set sensor.resources.requests.cpu=500m \
  --set sensor.resources.limits.memory=500Mi \
  --set sensor.resources.limits.cpu=500m
Additional information about Helm charts

To further customize your Helm installation consult these documents:

Installation via Scripts

The deploy script will:

  1. Launch StackRox Central Services
  2. Create a cluster configuration and a service identity
  3. Deploy the StackRox Secured Cluster Services using that configuration and those credentials

You can set the environment variable MAIN_IMAGE_TAG in your shell to ensure that you get the version you want.

If you check out a commit, the scripts will launch the image corresponding to that commit by default. The image will be pulled if needed.

Further steps are orchestrator specific.

Kubernetes Distributions (EKS, AKS, GKE)

Click to expand

Follow the guide below to quickly deploy a specific version of StackRox to your Kubernetes cluster in the stackrox namespace. If you want to install a specific version, make sure to define/set it in MAIN_IMAGE_TAG, otherwise it will install the latest nightly build.

Run the following in your working directory of choice:

git clone [email protected]:stackrox/stackrox.git
cd stackrox
MAIN_IMAGE_TAG=VERSION_TO_USE ./deploy/deploy.sh

After a few minutes, all resources should be deployed.

Credentials for the 'admin' user can be found in the ./deploy/k8s/central-deploy/password file.

Note: While the password file is stored in plaintext on your local filesystem, the Kubernetes Secret StackRox uses is encrypted, and you will not be able to alter the secret at runtime. If you lose the password, you will have to redeploy central.

OpenShift

Click to Expand

Before deploying on OpenShift, ensure that you have the oc - OpenShift Command Line installed.

Follow the guide below to quickly deploy a specific version of StackRox to your OpenShift cluster in the stackrox namespace. Make sure to add the most recent tag to the MAIN_IMAGE_TAG variable.

Run the following in your working directory of choice:

git clone [email protected]:stackrox/stackrox.git
cd stackrox
MAIN_IMAGE_TAG=VERSION_TO_USE ./deploy/deploy.sh

After a few minutes, all resources should be deployed. The process will complete with this message.

Credentials for the 'admin' user can be found in the ./deploy/openshift/central-deploy/password file.

Note: While the password file is stored in plaintext on your local filesystem, the Kubernetes Secret StackRox uses is encrypted, and you will not be able to alter the secret at runtime. If you loose the password, you will have to redeploy central.

Docker Desktop, Colima, or minikube

Click to Expand

Run the following in your working directory of choice:

git clone [email protected]:stackrox/stackrox.git
cd stackrox
MAIN_IMAGE_TAG=latest ./deploy/deploy-local.sh

After a few minutes, all resources should be deployed.

Credentials for the 'admin' user can be found in the ./deploy/k8s/deploy-local/password file.

Accessing the StackRox User Interface (UI)

Click to expand

After the deployment has completed (Helm or script install) a port-forward should exist, so you can connect to https://localhost:8000/. Run the following

kubectl port-forward -n 'stackrox' svc/central "8000:443"

Then go to https://localhost:8000/ in your web browser.

Username = The default user is admin

Password (Helm) = The password is in $STACKROX_ADMIN_PASSWORD after a manual installation, or printed at the end of the quick installation script.

Password (Script) = The password will be located in the /deploy/<orchestrator>/central-deploy/password.txt folder for the script install.


Development

Quickstart

Build Tooling

The following tools are necessary to test code and build image(s):

Click to expand
  • Make
  • Go
  • Various Go linters that can be installed using make reinstall-dev-tools.
  • UI build tooling as specified in ui/README.md.
  • Docker
    • Note: Docker Desktop now requires a paid subscription for larger, enterprise companies.
    • Some StackRox devs recommend Colima
  • Xcode command line tools (macOS only)
  • Bats is used to run certain shell tests. You can obtain it with brew install bats or npm install -g bats.
  • oc OpenShift cli tool
  • shellcheck for shell scripts linting.

Xcode - macOS Only

Usually you would have these already installed by brew. However, if you get an error when building the golang x/tools, try first making sure the EULA is agreed by:

  1. starting Xcode
  2. building a new blank app project
  3. starting the blank project app in the emulator
  4. close both the emulator and the Xcode, then
  5. run the following commands:
xcode-select --install
sudo xcode-select --switch /Library/Developer/CommandLineTools # Enable command line tools
sudo xcode-select -s /Applications/Xcode.app/Contents/Developer

For more info, see nodejs/node-gyp#569

Clone StackRox

Click to expand
# Create a GOPATH: this is the location of your Go "workspace".
# (Note that it is not – and must not – be the same as the path Go is installed to.)
# The default is to have it in ~/go/, or ~/development, but anything you prefer goes.
# Whatever you decide, create the directory, set GOPATH, and update PATH:
export GOPATH=$HOME/go # Change this if you choose to use a different workspace.
export PATH=$PATH:$GOPATH/bin
# You probably want to permanently set these by adding the following commands to your shell
# configuration (e.g. ~/.bash_profile)

cd $GOPATH
mkdir -p bin pkg
mkdir -p src/github.com/stackrox
cd src/github.com/stackrox
git clone [email protected]:stackrox/stackrox.git

Local Development

Click to expand

To sweeten your experience, install the workflow scripts beforehand.

$ cd $GOPATH/src/github.com/stackrox/stackrox
$ make install-dev-tools
$ make image

Now, you need to bring up a Kubernetes cluster yourself before proceeding. Development can either happen in GCP or locally with Docker Desktop, Colima, minikube. Note that Docker Desktop and Colima are more suited for macOS development, because the cluster will have access to images built with make image locally without additional configuration. Also, Collector has better support for these than minikube where drivers may not be available.

# To keep the StackRox Central's Postgres DB state between database upgrades and restarts, set:
$ export STORAGE=pvc

# To save time on rebuilds by skipping UI builds, set:
$ export SKIP_UI_BUILD=1

# To save time on rebuilds by skipping CLI builds, set:
$ export SKIP_CLI_BUILD=1

# When you deploy locally make sure your kube context points to the desired kubernetes cluster,
# for example Docker Desktop.
# To check the current context you can call a workflow script:
$ roxkubectx

# To deploy locally, call:
$ ./deploy/deploy-local.sh

# Now you can access StackRox dashboard at https://localhost:8000
# or simply call another workflow script:
$ logmein

See Installation via Scripts for further reading. To read more about the environment variables, consult deploy/README.md.

Common Makefile Targets

Click to expand
# Build image, this will create `stackrox/main` with a tag defined by `make tag`.
$ make image

# Compile all binaries
$ make main-build-dockerized

# Displays the docker image tag which would be generated
$ make tag

# Note: there are integration tests in some components, and we currently
# run those manually. They will be re-enabled at some point.
$ make test

# Apply and check style standards in Go and JavaScript
$ make style

# enable pre-commit hooks for style checks
$ make init-githooks

# Compile and restart only central
$ make fast-central

# Compile only sensor
$ make fast-sensor

# Only compile protobuf
$ make proto-generated-srcs

Productivity

Click to expand

The workflow repository contains some helper scripts which support our development workflow. Explore more commands with roxhelp --list-all.

# Change directory to rox root
$ cdrox

# Handy curl shortcut for your StackRox central instance
# Uses https://localhost:8000 by default or ROX_BASE_URL env variable
# Also uses the admin credentials from your last deployment via deploy.sh
$ roxcurl /v1/metadata

# Run quickstyle checks, faster than stackrox's "make style"
$ quickstyle

# The workflow repository includes some tools for supporting
# working with multiple inter-dependent branches.
# Examples:
$ smart-branch <branch-name>    # create new branch
    ... work on branch...
$ smart-rebase                  # rebase from parent branch
    ... continue working on branch...
$ smart-diff                    # check diff relative to parent branch
    ... git push, etc.

GoLand Configuration

Click to expand

If you're using GoLand for development, the following can help improve the experience.

Make sure the Protocol Buffers plugin is installed. The plugin comes installed by default in GoLand. If it isn't, use Help | Find Action..., type Plugins and hit enter, then switch to Marketplace, type its name and install the plugin.

This plugin does not know where to look for .proto imports by default in GoLand therefore you need to explicitly configure paths for this plugin. See https://github.com/jvolkman/intellij-protobuf-editor#path-settings.

  • Go to GoLand | Preferences | Languages & Frameworks | Protocol Buffers.
  • Uncheck Configure automatically.
  • Click on + button, navigate and select ./proto directory in the root of the repo.
  • Optionally, also add $HOME/go/pkg/mod/github.com/gogo/[email protected] and $HOME/go/pkg/mod/github.com/gogo/[email protected]/.
  • To verify: use menu Navigate | File... type any .proto file name, e.g. alert_service.proto, and check that all import strings are shown green, not red.

Running sql_integration tests

Click to expand

Go tests annotated with //go:build sql_integration require a PostgreSQL server listening on port 5432. Due to how authentication is set up in code, it is the easiest to start Postgres in a container like this:

$ docker run --rm --env POSTGRES_USER="$USER" --env POSTGRES_HOST_AUTH_METHOD=trust --publish 5432:5432 docker.io/library/postgres:13

With that running in the background, sql_integration tests can be triggered from IDE or command-line.

Debugging

Click to expand

Kubernetes debugger setup

With GoLand, you can naturally use breakpoints and the debugger when running unit tests in IDE.

If you would like to debug local or even remote deployment, follow the procedure below.

  1. Create debug build locally by exporting DEBUG_BUILD=yes:
    $ DEBUG_BUILD=yes make image
    Alternatively, debug build will also be created when the branch name contains -debug substring. This works locally with make image and in CI.
  2. Deploy the image using instructions from this README file. Works both with deploy-local.sh and deploy.sh.
  3. Start the debugger (and port forwarding) in the target pod using roxdebug command from workflow repo.
    # For Central
    $ roxdebug
    # For Sensor
    $ roxdebug deploy/sensor
    # See usage help
    $ roxdebug --help
  4. Configure GoLand for remote debugging (should be done only once):
    1. Open Run | Edit Configurations …, click on the + icon to add new configuration, choose Go Remote template.
    2. Choose Host: localhost and Port: 40000. Give this configuration some name.
    3. Select On disconnect: Leave it running (this prevents GoLand forgetting breakpoints on reconnect).
  5. Attach GoLand to debugging port: select Run | Debug… and choose configuration you've created. If all done right, you should see Connected message in the Debug | Debugger | Variables window at the lower part of the screen.
  6. Set some code breakpoints, trigger corresponding actions and happy debugging!

See Debugging go code running in Kubernetes for more info.

Generating Portable Installers

Kubernetes
docker run -i --rm quay.io/stackrox-io/main:<tag> central generate interactive > k8s.zip

This will run you through an installer and generate a k8s.zip file.

unzip k8s.zip -d k8s
bash k8s/central.sh

Now Central has been deployed. Use the UI to deploy Sensor.

OpenShift

Note: If using a host mount, you need to allow the container to access it by using sudo chcon -Rt svirt_sandbox_file_t <full volume path>

Take the image-setup.sh script from this repo and run it to do the pull/push to local OpenShift registry. This is a prerequisite for every new cluster.

bash image-setup.sh
docker run -i --rm quay.io/stackrox-io/main:<tag> central generate interactive > openshift.zip

This will run you through an installer and generate a openshift.zip file.

unzip openshift.zip -d openshift
bash openshift/central.sh

Dependencies and Recommendations for Running StackRox

Click to Expand

The following information has been gathered to help with the installation and operation of the open source StackRox project. These recommendations were developed for the Red Hat Advanced Cluster Security for Kubernetes product and have not been tested with the upstream StackRox project.

Recommended Kubernetes Distributions

The Kubernetes Platforms that StackRox has been deployed onto with minimal issues are listed below.

  • Red Hat OpenShift Dedicated (OSD)
  • Azure Red Hat OpenShift (ARO)
  • Red Hat OpenShift Service on AWS (ROSA)
  • Amazon Elastic Kubernetes Service (EKS)
  • Google Kubernetes Engine (GKE)
  • Microsoft Azure Kubernetes Service (AKS)

If you deploy into a Kubernetes distribution other than the ones listed above you may encounter issues.

Recommended Operating Systems

StackRox is known to work on the recent versions of the following operating systems.

  • Ubuntu
  • Debian
  • Red Hat Enterprise Linux (RHEL)
  • CentOS
  • Fedora CoreOS
  • Flatcar Container Linux
  • Google COS
  • Amazon Linux
  • Garden Linux

Recommended Web Browsers

The following table lists the browsers that can view the StackRox web user interface.

  • Google Chrome 88.0 (64-bit)
  • Microsoft Internet Explorer Edge
    • Version 44 and later (Windows)
    • Version 81 (Official build) (64-bit)
  • Safari on MacOS (Mojave) - Version 14.0
  • Mozilla Firefox Version 82.0.2 (64-bit)

collector's People

Contributors

03cranec avatar aleks-f avatar bwolmarans avatar dkogan avatar erthalion avatar evgeni avatar gavin-stackrox avatar gianlucaborello avatar gregkh avatar henridf avatar jarun avatar joshdk avatar joukovirtanen avatar kristopolous avatar ldegio avatar ltagliamonte avatar luca3m avatar misberner avatar molter73 avatar mstemm avatar ovalenti avatar red-hat-konflux[bot] avatar ret2libc avatar rhacs-bot avatar robbycochran avatar roxbot avatar samuelepilleri avatar srudi88 avatar stringy avatar tommartensen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

collector's Issues

CrashLoopBackopff in Collector's Deamon Set on OpenShift 4.9

Hello Team, received the following error while deploying the collector in openshift 4.9. Initially thought this is a permission issue and added the required SCC to collector's service account, but still the issue persists.

terminate called after throwing an instance of 'scap_open_exception'
  what():  can't create map: Permission denied
collector[0x448f7d]
/lib64/libc.so.6(+0x4eb80)[0x7f726981fb80]
/lib64/libc.so.6(gsignal+0x10f)[0x7f726981faff]
/lib64/libc.so.6(abort+0x127)[0x7f72697f2ea5]
/lib64/libstdc++.so.6(+0x9009b)[0x7f726a1c109b]
/lib64/libstdc++.so.6(+0x9653c)[0x7f726a1c753c]
/lib64/libstdc++.so.6(+0x96597)[0x7f726a1c7597]
/lib64/libstdc++.so.6(+0x967f8)[0x7f726a1c77f8]
/usr/local/lib/libsinsp-wrapper.so(+0x240ef5)[0x7f726c82cef5]
/usr/local/lib/libsinsp-wrapper.so(_ZN5sinsp4openERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x36)[0x7f726c866c16]
collector[0x4d2b34]
collector[0x46631c]
collector[0x442bec]
/lib64/libc.so.6(__libc_start_main+0xe5)[0x7f726980bd85]
collector[0x448e2e]
Caught signal 6 (SIGABRT): Aborted
/bootstrap.sh: line 94:    10 Aborted                 eval exec "$@"

Collector eBPF probe evaluation with custom eBPF probe (on s390x)

Overview

With the eBPF enablement for s390x in Falco libs (https://github.com/hbrueckner/falcosecurity-libs/tree/mauro-pull-libs-20221115-s390x-bpf), I looked further into the custom eBPF probe, introduced with #670. There I experienced few issues that I'd like to discuss here.

My approach to get familiar and make the custom eBPF by replacing the Falco libs probe directly with the custom probe and using sinsp-example to install and capture events.

Oberservations

BPF_SUPPORTS_RAW_TRACEPOINTS

The BPF_SUPPORTS_RAW_TRACEPOINTS macro is explicitly undefined in line 12. While this has effects on the raw syscalls enter/exit (catch-all), this also disables the functionality of the sched_process_fork trace point. This tracepoint is actually required for s390x and aarch64 to capture newly created child process information as both architecture do not generate in the fork/vfork/cloneX system calls. See here for more details. Finally, my question here is whether collector and the upper layer require to obtain child process events?

Additional background: The raw tracepoints support is required to get access to the TP_PROTO arguments, in particular, to the struct task_struct of the parent and child processes. Those are required by the respective Falco libs filler (child and parent).

COLLECTOR_PROBEs

Looking into the those collector probes and reviewing related PR #670, I have few questions on that.

Instead of using the raw system call enter / exit events, those COLLECTOR_PROBE install BPF programs directly on the specified system call, e.g., chdir. This approach is very nice to focus on specific system calls rather hooking into every system call.

Trying those on s390x (including fixing few BPF verifier issues), resulted in this error message:

# ./libsinsp/examples/sinsp-example -b  driver/bpf/probe.o -j
-- Try to open: 'bpf' engine.
driver/bpf/probe.o
terminate called after throwing an instance of 'scap_open_exception'
  what():  PERF_EVENT_IOC_SET_BPF: Permission denied
Aborted (core dumped)

Looking into this issue, attaching the program to the syscall fails and further looking into the respective kernel sources, this issue seems to relate that the BPF program tries to data beyond the defined tracepoint structure (format).

For the chdir, the tracepoint structure format for its enter and exit is defined as follows:

# cat /sys/kernel/tracing/events/syscalls/sys_enter_chdir/format 
name: sys_enter_chdir
ID: 604
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:int __syscall_nr; offset:8;       size:4; signed:1;
        field:const char * filename;    offset:16;      size:8; signed:0;

print fmt: "filename: 0x%08lx", ((unsigned long)(REC->filename))

# cat /sys/kernel/tracing/events/syscalls/sys_exit_chdir/format 
name: sys_exit_chdir
ID: 603
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:int __syscall_nr; offset:8;       size:4; signed:1;
        field:long ret; offset:16;      size:8; signed:1;

print fmt: "0x%lx", REC->ret

This basically translates to those structure definition (there are similar the raw system call enter exit structures):

struct sys_enter_chdir_args
{
        __u64 pad;
        int __syscall_nr;
        const char *filename;

struct sys_exit_chdir_args
{
        __u64 pad;
        int __syscall_nr;
        long ret;
};

These structures are passed as context (ctx) to the enter_probe and exit_probe functions when the tracepoint is triggered. However, those tracepoint structure are syscall specific. The Falco libs implementation however deals with struct sys_enter_args and struct sys_exit_args. Looking specifically to the struct sys_exit_args:

struct sys_exit_args {
	__u64 pad;
	long id;
	long ret;
};

and comparing with the struct sys_exit_chdir_args above, accessingret (return value) would read beyond the struct sys_exit_chdir_args structure because the Falco probe and filler implementation works on struct sys_exit_args. This seems to explain the PERF_EVENT_IOC_SET_BPF: Permission denied from above.

With above background on syscall-specific structures vs. Falco implementation on struct sys_{enter,exit}_args, I tried to map the syscall-specific structure into the syscall enter structures, e.g. for chdir.

In probe_enter, perform the conversion before storing/stashing the arguments:

  //memcpy(stack_ctx.args, ctx->args, sizeof(ctx->args));
  switch (id)   /* maybe use sc_evt here..... */
  {
  case __NR_chdir: {
        struct sys_enter_chdir_args *chdir_args = (struct sys_enter_chdir_args *)ctx;
        stack_ctx.args[0] = (unsigned long)chdir_args->filename;
        break;
  }
  }

With that change, the original ctx is preserved and the stack_ctx is a struct sys_enter_args structure containing the arguments from the syscall specific tracepoint format. Also note that the ctx needs to be passed down the stack, especially to call_filler and fork kind of filler as they requires the context to perform the bpf_tail_call to subsequent filler implementation.

Using this approach (and limiting here for the enter event only), results in:

./libsinsp/examples/sinsp-example -b  driver/bpf/probe.o -j :
{"evt.args":"","evt.cpu":1,"evt.dir":">","evt.num":5,"evt.time":1669651102819446242,"evt.type":"chdir","proc.name":"bash","thread.tid":1056}

For the probe_exit function, this becomes more difficult because the system call specific tracepoint format does not match the generic system call exit args structure (int vs. long). Also Falco fillers use bpf_syscall_get_retval that operates on ctx and casts it to struct sys_exit_args.

Summary

Based on the above observations, and trying different variations with the custom probe, here is my summary for s390x:

  • No raw tracepoint support (BPF_SUPPORTS_RAW_TRACEPOINTS) + COLLECTOR_PROBE has a system call specific context structure different from Falco assumption. (See above).

  • No raw tracepoint support (BPF_SUPPORTS_RAW_TRACEPOINTS) + raw_syscalls/sys_{enter,exit} aka. COLLECTOR_LEGACY_PROBE: Works with some BPF verifier fixes but sched_process_fork filler does not work due to filler requirements to access the arguments of the tracepoint definition (= TP_PROTO). Without sched_process_fork, there will be no child process events on s390x (and aarch64). Also it has some performance effects because of generic system call probes (catch-all).

  • Supporting raw tracepoints (BPF_SUPPORTS_RAW_TRACEPOINTS) + COLLECTOR_LEGACY_PROBE. Raw tracepoint support also enables sched_process_fork tracepoint. This is more aligned how falco libs works but, of course, has some performance effects because of generic system call probes (catch-all) -- potentially a bit less due to raw tracepont support.

I know this a quite large documentation on my observations and hope that the details help to understand it.

cc: @Molter73 @kcrane @robbycochran

[ROX-15070] BTF infrastructure

The idea is to make CI pipelines support the "modern" probes. To achieve that:

  • Build Falco with modern bpf enabled #1068
  • Create a new testing pipeline for "modern" probes #1067
  • (Optional for MVP) Modify the build pipeline to create and embed the new probe into an image #1019
  • (Optional for MVP) To make solution more robust, include new probe into the regular probe delivery channels as well (so that Collector can try to fetch those probes via some alternative way)
  • (Optional for MVP) Investigate possibility of using btfhub

See ROX-15070 for the higher-level details.

Segmentation Fault on all nodes in OpenShift 4.9.33

Hi,

Disclaimer: I have opened the same issue at stackrox/stackrox#3195 because I am not sure on which repository this should be tracked as here we have a area/collector label. Please close the one which is at the wrong location.

we are experiencing crashes in collector containers across all nodes in one of our OpenShift clusters.

Debug Log:

Collector Version: 3.9.0
OS: Red Hat Enterprise Linux CoreOS 49.84.202205050701-0 (Ootpa)
Kernel Version: 4.18.0-305.45.1.el8_4.x86_64
Starting StackRox Collector...
[I 20220926 112218 HostInfo.cpp:126] Hostname: '<redacted>'
[I 20220926 112218 CollectorConfig.cpp:119] User configured logLevel=debug
[I 20220926 112218 CollectorConfig.cpp:149] User configured collection-method=kernel_module
[I 20220926 112218 CollectorConfig.cpp:206] Afterglow is enabled
[D 20220926 112218 HostInfo.cpp:200] EFI directory exist, UEFI boot mode
[D 20220926 112218 HostInfo.h:100] identified kernel release: '4.18.0-305.45.1.el8_4.x86_64'
[D 20220926 112218 HostInfo.h:101] identified kernel version: '#1 SMP Wed Apr 6 13:48:37 EDT 2022'
[D 20220926 112218 HostInfo.cpp:297] SecureBoot status is 2
[D 20220926 112218 collector.cpp:254] Core dump not enabled
[I 20220926 112218 collector.cpp:302] Module version: 2.0.1
[I 20220926 112218 collector.cpp:329] Attempting to download kernel module - Candidate kernel versions:
[I 20220926 112218 collector.cpp:331] 4.18.0-305.45.1.el8_4.x86_64
[D 20220926 112218 GetKernelObject.cpp:148] Checking for existence of /kernel-modules/collector-4.18.0-305.45.1.el8_4.x86_64.ko.gz and /kernel-modules/collector-4.18.0-305.45.1.el8_4.x86_64.ko
[D 20220926 112218 GetKernelObject.cpp:151] Found existing compressed kernel object.
[I 20220926 112218 collector.cpp:262]
[I 20220926 112218 collector.cpp:263] This product uses kernel module and ebpf subcomponents licensed under the GNU
[I 20220926 112218 collector.cpp:264] GENERAL PURPOSE LICENSE Version 2 outlined in the /kernel-modules/LICENSE file.
[I 20220926 112218 collector.cpp:265] Source code for the kernel module and ebpf subcomponents is available upon
[I 20220926 112218 collector.cpp:266] request by contacting [email protected].
[I 20220926 112218 collector.cpp:267]
[I 20220926 112218 collector.cpp:162] Inserting kernel module /module/collector.ko with indefinite removal and retry if required.
[D 20220926 112218 collector.cpp:109] Kernel module arguments: s_syscallIds=26,27,56,57,246,247,248,249,94,95,14,15,156,157,216,217,222,223,4,5,22,23,12,13,154,155,172,173,214,215,230,231,282,283,288,289,292,293,96,97,182,183,218,219,224,225,16,186,234,194,195,192,193,200,201,198,199,36,37,18,19,184,185,220,221,226,227,-1 verbose=0 exclude_selfns=1 exclude_initns=1
[I 20220926 112218 collector.cpp:183] Done inserting kernel module /module/collector.ko.
[I 20220926 112218 collector.cpp:215] gRPC server=sensor.mcs-security.svc:443
[I 20220926 112218 CollectorService.cpp:50] Config: collection_method:kernel_module, useChiselCache:1, snapLen:0, scrape_interval:30, turn_off_scrape:0, hostname:<redacted>, logLevel:DEBUG
[I 20220926 112218 CollectorService.cpp:79] Network scrape interval set to 30 seconds
[I 20220926 112218 CollectorService.cpp:82] Waiting for GRPC server to become ready ...
[I 20220926 112218 CollectorService.cpp:87] GRPC server connectivity is successful
[D 20220926 112218 ConnTracker.cpp:314] ignored l4 protocol and port pairs
[D 20220926 112218 ConnTracker.cpp:316] udp/9
[I 20220926 112218 NetworkStatusNotifier.cpp:187] Started network status notifier.
[I 20220926 112218 NetworkStatusNotifier.cpp:203] Established network connection info stream.
[D 20220926 112218 SysdigService.cpp:262] Updating chisel and flushing chisel cache
[D 20220926 112218 SysdigService.cpp:263] New chisel:
args = {}
function on_event()
    return true
end
function on_init()
    filter = "not container.id = 'host'\n"
    chisel.set_filter(filter)
    return true
end

[I 20220926 112218 SignalServiceClient.cpp:43] Trying to establish GRPC stream for signals ...
[I 20220926 112218 SignalServiceClient.cpp:61] Successfully established GRPC stream for signals.
[D 20220926 112219 ConnScraper.cpp:406] Could not open process directory 1626873: No such file or directory
[D 20220926 112219 ConnScraper.cpp:406] Could not open process directory 1626877: No such file or directory
[W 20220926 112219 ProtoAllocator.h:41] Allocating a memory block on the heap for the arena, this is inefficient and usually avoidable
collector[0x44746d]
/lib64/libc.so.6(+0x4eb20)[0x7f8425ceeb20]
Caught signal 11 (SIGSEGV): Segmentation fault
/bootstrap.sh: line 94:    11 Segmentation fault      (core dumped) eval exec "$@"
Collector kernel module has already been loaded.
Removing so that collector can insert it at startup.

I am not sure how to debug this as all daemonSet containers experience this problem.

We are using StackRox 3.71.0. I have tried with collector images 3.9.0 and 3.11.0. Please reach out for any missing information.

Problem to download specific kernel module

Hi.
My AKS uses the kernel 5.4.0-1089-azure, so it needed the kernel module collector-5.4.0-1089-azure.ko.gz. But I haven't been seeing it here:

https://collector-modules.stackrox.io/612dd2ee06b660e728292de9393e18c81a88f347ec52a39207c5166b5302b656/b6745d795b8497aaf387843dc8aa07463c944d3ad67288389b754daaebea4b62/collector-5.4.0-1089-azure.ko.gz

However, other modules are there, e.g. collector-5.4.0-1086-azure.ko.gz

I think that collector-modules.starox.io is the right domain to look up new modules. My stackrox version is 3.0.62.0

I want to notify this behavior and request to upload the module mentioned. I appreciate any help you can provide.

[ROX-15068] Enable modern Falco probe engine

The idea is to make "modern" Falco probe engine available for Collector. To achieve that:

  • Syncing up our Falco fork with the latest changes #1016
  • Update Collector probe loader to recognize and load new type of probes #1065
  • Modern Falco probes supports only relatively new kernel versions (>= 5.8), the lower boundary is enforced via version checking. Figure out if it's possible to bump it down for lower versions where necessary features were backported #1037
  • Customize modern Falco probes to load only a certain subset of tail-called programs to avoid unsupported syscalls problems #1044
  • Make sure sinsp-example could be used for testing, and specify few tests for the new probe in isolation (optionally) #1043
  • Implement standalone Collector for testing, and specify few tests for the new probe with Collector only (optionally)

See ROX-15068 for the higher-level details.

Identify collector's required capabilities

Following removal of kernel modules, collector will be running with all capabilities (as a privileged container.) To harden the collector process, we should drop all capabilities that we don't require.

  • Identify required capabilities for running the various collector components
    • Kernel driver (eBPF/CORE bpf)
    • conn scraper
    • self-check process
  • Implement a collection-method agnostic way of dropping capabilities once startup has concluded.
  • Identity if any mounts are no longer needed

Temporary option to not fallback for MVP

Temporary do not fallback to eBPF if modern probes are failed to load. Turns out that e.g. verifier error is not preventing Collector from continuing. Note, that in general it's good, just not for MVP.

Part of #1011

Disable Kernel Module Collection in stackrox/stackrox tests

Prevent the stackrox tests (e2e as well as unit tests) from configuring a collector with kernel-module collection, to avoid test breakages while removing/disabling kernel module collection.

Note: it is possible they are using eBPF only, due to the default, but this should be verified.

btf/vmlinux for Collector

Modern probe require /sys/kernel/btf/vmlinux present and available from the Collector container. Currently it is actually the case at least for OCP 4.12 -- a privileged container is able too read vmlinux file without any extra mounts. At the same time Collector is mounting host sysfs under /host/ path, including required vmlinux file. What needs to be done:

  • Figure out, if we can rely on vmlinux present by K8S/OCP in a privileged container? Is it completely the same as what is mounted from the host via /host path?
  • If those are not the same, or vmlinux file is not provided on all supported platforms, make sure it's included into the current sysfs mounts into the Collector container. If not, add it into the DaemonSet definition.
  • If vmlinux will end up by the path different from /sys/kernel/btf/vmlinux or one of other standard locations (/boot/vmlinux..., /lib/modules/..., /usr/lib/modules/..., see libbpf for more details), Falco modern bpf engine has to be extended to pass load options for the probe. This would be probably a separate task from this one.

Part of #1015

collector logging running amok

image

our aggregated logging was exploding due to stackrox collector logging.. there where millions of lines like this the last days from 3/10 collector nodes..

actually to mitigate as fast as possible we had to delete stackrox

this is the error message which where present millions of times, this message appaered on 3 from 10 collector nodes about 6-10 times every millisecond..
[E 20221209 165715 ConnScraper.cpp:415] Could not determine network namespace: No such file or directory

it was deployed like this:
/bin/bash <(curl -fsSL https://raw.githubusercontent.com/stackrox/stackrox/master/scripts/quick-helm-install.sh)

on gke v1.24

Figure out how to extend libbpf skeleton

For MPV we would go with a probe embedded into the binary, a usual approach with libbpf skeletons. But it's important to understand if it could be changed in the future. Figure out how much efforts would that be to load the actual bpf probe from a not embedded file.

Part of #1014

Ubuntu 22.04.1 LTS Collector not working

Hi!

I'm trying to run StackRox on 22.04.1 LTS (Jammy Jellyfish) with Kernel Linux 5.15.0-52-generic x86_64. This Kernel is supported according to KERNEL_VERSIONS.

Looks like Collector starts normally (I use quay.io/stackrox-io/collector-slim:3.72.1):

Collector Version: 3.11.1
OS: Ubuntu 22.04.1 LTS
Kernel Version: 5.15.0-52-generic
Starting StackRox Collector...
[I 20221118 071354 HostInfo.cpp:126] Hostname: 'dsworker2'
[I 20221118 071354 CollectorConfig.cpp:149] User configured collection-method=ebpf
[I 20221118 071354 CollectorConfig.cpp:206] Afterglow is enabled
[I 20221118 071354 collector.cpp:302] Module version: 2.1.0
[I 20221118 071354 collector.cpp:329] Attempting to download eBPF probe - Candidate kernel versions:
[I 20221118 071354 collector.cpp:331] 5.15.0-52-generic
[I 20221118 071354 GetKernelObject.cpp:180] Local storage does not contain collector-ebpf-5.15.0-52-generic.o
[I 20221118 071354 GetKernelObject.cpp:194] Successfully downloaded and decompressed /module/collector-ebpf.o
[I 20221118 071354 collector.cpp:262]
[I 20221118 071354 collector.cpp:263] This product uses kernel module and ebpf subcomponents licensed under the GNU
[I 20221118 071354 collector.cpp:264] GENERAL PURPOSE LICENSE Version 2 outlined in the /kernel-modules/LICENSE file.
[I 20221118 071354 collector.cpp:265] Source code for the kernel module and ebpf subcomponents is available upon
[I 20221118 071354 collector.cpp:266] request by contacting [email protected].
[I 20221118 071354 collector.cpp:267]
[I 20221118 071354 collector.cpp:215] gRPC server=sensor.stackrox.svc:443
[I 20221118 071354 CollectorService.cpp:50] Config: collection_method:ebpf, useChiselCache:1, snapLen:0, scrape_interval:30, turn_off_scrape:0, hostname:dsworker2, logLevel:INFO
[I 20221118 071354 CollectorService.cpp:79] Network scrape interval set to 30 seconds
[I 20221118 071354 CollectorService.cpp:82] Waiting for GRPC server to become ready ...
[I 20221118 071354 CollectorService.cpp:87] GRPC server connectivity is successful
[I 20221118 071354 NetworkStatusNotifier.cpp:168] Started network status notifier.
[I 20221118 071354 NetworkStatusNotifier.cpp:182] Established network connection info stream.
[I 20221118 071354 SignalServiceClient.cpp:43] Trying to establish GRPC stream for signals ...
[I 20221118 071354 SignalServiceClient.cpp:61] Successfully established GRPC stream for signals.

And that's it. In StackRox console I can see only this message: "No processes discovered. The selected deployment may not have running pods, or Collector may not be running in your cluster. It is recommended to check the logs for more information". Nothing happens.

And if I will check "System Health" status - everything is "green". Maybe you have some ideas what am I doing wrong? Thanks in advance!!!

Heuristic for BTF support

Add new heuristic to Collector to figure out if BTF probe could be used. Two
main things to verify are:

  • BTF support in the kernel. Could be done similar to bpftool via checking
    kernel configuration.

  • BTF vmlinux is available. Similar to libbpf, usually /sys/kernel/btf/vmlinux

  • BPF ringbuf maps are available. There are helpers in libbpf that could be helpful.

Part of #1011

Improve stacktraces in Collector

Falco modern bpf engine is not yet fully tested in the field. To make troubleshooting of potential issues easier improve Collector stacktrace to provide more information.

Part of #1011

Embed a probe into the image

Modify the build pipeline to embed a probe into the image. Here we don't have to bother yet what probe is that, for testing as close as possible to the reality we could e.g. build a vanilla falco modern probe with libsinsp example and use it. This will involve similar steps, e.g. actual building of something, fetching a new dependency (libbpf) etc.

Part of #1014

Remove Kernel Module collection from stackrox/stackrox

Collector eBPF kernel-module error in On Premise Kubernetes Cluster

Hello! The collector pods are bouncing between Running and CrashLoopBackOff due to the below error:

[2022-11-08T13:30:43.219349718+03:00 I 20221108 103043 collector.cpp:329] Attempting to download eBPF probe - Candidate kernel versions: 

[I 20221108 103043 collector.cpp:331] 5.15.35-5.el7.3.x86_64
[I 20221108 103043 GetKernelObject.cpp:180] Local storage does not contain collector-ebpf-5.15.35-5.el7.3.x86_64.o
[2022-11-08T13:30:43.440220207+03:00 I2022-11-08T13:30:43.440283536+03:00  20221108 103043 FileDownloader.cpp:316] Fail to download /module/collector-ebpf.o.gz - Failed writing body (0 != 10)
[I 20221108 103043 FileDownloader.cpp:318] HTTP Request failed with error code '404' - HTTP Body Response: not found
[I 20221108 103044 FileDownloader.cpp:316] Fail to download /module/collector-ebpf.o.gz - Failed writing body (0 != 10)
[I 2022112022-11-08T13:30:44.445460645+03:00 08 103044 FileDownloader.cpp:318] HTTP Request failed with error code '404' - HTTP Body Response: not found
......
[W 20221108 103112 FileDownloader.cpp:332] Failed to download /module/collector-ebpf.o.gz
[W 20221108 103112 GetKernelObject.cpp:183] Unable to download kernel object collector-ebpf-5.15.35-5.el7.3.x86_64.o to /module/collector-ebpf.o.gz
[W 2022112022-11-08T13:31:12.584992655+03:00 08 103112 collector.cpp:343] Error getting kernel object: collector-ebpf-5.15.35-5.el7.3.x86_64.o
[I 20221108 103112 collector.cpp:215] gRPC server=sensor.stackrox.svc:443
[2022-11-08T13:31:12.585667313+03:00 I2022-11-08T13:31:12.585678270+03:00  2022-11-08T13:31:12.585689123+03:00 20222022-11-08T13:31:12.585698916+03:00 112022-11-08T13:31:12.585708326+03:00 082022-11-08T13:31:12.585717352+03:00  2022-11-08T13:31:12.585726504+03:00 102022-11-08T13:31:12.585735847+03:00 312022-11-08T13:31:12.585745043+03:00 122022-11-08T13:31:12.585754222+03:00  2022-11-08T13:31:12.585763262+03:00 collector.cpp2022-11-08T13:31:12.585772387+03:00 :2022-11-08T13:31:12.585781692+03:00 3572022-11-08T13:31:12.585790958+03:00 ] 2022-11-08T13:31:12.585800229+03:00 Attempting to connect to GRPC server2022-11-08T13:31:12.585809299+03:00 

[2022-11-08T13:31:12.585830551+03:00 E2022-11-08T13:31:12.585839485+03:00  2022-11-08T13:31:12.585849167+03:00 20222022-11-08T13:31:12.585858503+03:00 112022-11-08T13:31:12.585868634+03:00 082022-11-08T13:31:12.585877519+03:00  2022-11-08T13:31:12.585886703+03:00 102022-11-08T13:31:12.585895845+03:00 312022-11-08T13:31:12.585904982+03:00 122022-11-08T13:31:12.585913900+03:00  2022-11-08T13:31:12.585922860+03:00 collector.cpp2022-11-08T13:31:12.585931770+03:00 :2022-11-08T13:31:12.585940638+03:00 3592022-11-08T13:31:12.585949507+03:00 ] 2022-11-08T13:31:12.585958593+03:00 Unable to connect to the GRPC server.2022-11-08T13:31:12.585967404+03:00 

[2022-11-08T13:31:12.586043197+03:00 F2022-11-08T13:31:12.586052670+03:00  2022-11-08T13:31:12.586062981+03:00 20222022-11-08T13:31:12.586072428+03:00 112022-11-08T13:31:12.586081771+03:00 082022-11-08T13:31:12.586090799+03:00  2022-11-08T13:31:12.586100030+03:00 102022-11-08T13:31:12.586125944+03:00 312022-11-08T13:31:12.586135654+03:00 122022-11-08T13:31:12.586145126+03:00  2022-11-08T13:31:12.586154678+03:00 collector.cpp2022-11-08T13:31:12.586178476+03:00 :368] No suitable kernel object downloaded

How can I troubleshoot? How cain i build own kernel-module?

Remove kernel module collection from collector

This should cover the actual deletion of code related to kernel module collection from within collector.

  • Kernel module loading
  • Kernel module driver download logic
  • HostHeuristics that force kernel module collection
  • ForceKernelModules configuration option

Load probes from the image

Teach Collector to load probes located inside the image. Here we don't have to
bother what type of probe is that yet, e.g. for testing it could be a regular
eBPF probe. The path for searching could be fixed, and later on aligned with CI
changes.

Part of #1011

Wire up open_modern_bpf

Incorporate modern bpf engine into CO-RE kernel candidate, including open_modern_bpf and any other needed bits.

Part of #1008

Fix exec core_bpf probes for COS

COS kernels have task_struct that differs from the vanilla one in regards how audit information is stored, loginuid is present in an intermediate structure audit_task_info. This mean regular CO-RE read will not pass verifier on COS kernels. To address it, check the field presence first, before extracting the value. Note, it's important that the condition is formulated this way -- the first branch should be an existing one, otherwise it will be removed as a dead code and no comparison will be made.

Part of #1008

Probe validation after load

After Collector has loaded the modern probe, it has to perform few simple checks that it works:

  • Verify the Collector process is scraped and reported.
  • Spin up a canary process and verify it is reported.
  • Verify the Collector open connections are scraped and reported.
  • Open a connection and verify it is reported.

The list above is a suggestion, any items that make sense could be added or removed (if too complicated). Similar idea could be e.g. that bpf maps contain expected records (configuration maps, tail called maps, rigbuf for events), but we go for it only if it could be implemented with relatively less efforts.

Part of #1011

Run collector in ‘standalone’ mode

Right now collector requires a gRPC server for it to run, when we need to debug an issue we have to either:

  • Deploy it alongside a full StackRox deployment.
  • Deploy it with a mock gRPC server that mocks sensor’s behaviour

It’d be great if we could directly run the collector binary without any external requirements, since it would make debugging quick changes a lot faster (compile the binary and run vs. compile, create the image, setup the environment, deploy the image with the change, etc.).

In order to get this done, the first requirement would be to make the grpc connection optional:

if (!useGRPC) {
CLOG(INFO) << "GRPC is disabled. Specify GRPC_SERVER='server addr' env and signalFormat = 'signal_summary' and signalOutput = 'grpc'";
}
std::shared_ptr<grpc::Channel> grpc_channel;
if (useGRPC) {
grpc_channel = createChannel(args);
}
config.grpc_channel = std::move(grpc_channel);

I believe there are some checks in the configuration that also fail and prevent collector from running but I can’t find the exact place, it’s in either of these files:
https://github.com/stackrox/collector/blob/master/collector/lib/CollectorArgs.cpp
https://github.com/stackrox/collector/blob/master/collector/lib/CollectorConfig.cpp

We also want to keep the behavior of running with the GRPC server to still be the default, so any changes need to be placed behind a feature flag, an environment variable like COLLECTOR_STANDALONE set to a none empty string would be nice. Here is an example on how we handle these flags:

const char* GetModuleDownloadBaseURL() {
const char* module_download_base_url = std::getenv("MODULE_DOWNLOAD_BASE_URL");
if (module_download_base_url && *module_download_base_url) return module_download_base_url;
CLOG(DEBUG) << "MODULE_DOWNLOAD_BASE_URL not set";
return "";
}

The final step would be to adjust the signal handlers we currently have to either be ignored or print to stdout. I think the easiest way would be to create a new signal handler and set it here:

if (conn_tracker) {
AddSignalHandler(MakeUnique<NetworkSignalHandler>(inspector_.get(), conn_tracker, &userspace_stats_));
}
if (config.grpc_channel) {
AddSignalHandler(MakeUnique<ProcessSignalHandler>(inspector_.get(), config.grpc_channel, &userspace_stats_));
}

For a first implementation, a signal handler printing process information to stdout would be nice and the network signal can be directly turned off when running in standalone, leaving its implementation for a follow up.

We have instructions on running the collector-builder image as a development environment in our how to start guide, running the collector binary directly in that container without any companion containers is the end goal.

Relax lower supported kernel for modern Falco probe

Currently modern Falco probe have hard coded boundary for oldest supported
kernels, although it can as well work with any kernel that have back patched
needed things (BTF, ring buffer maps, trace programs).

Make modern bpf engine more flexible, verify required features instead of
simply checking the kernel version.

Part of #1008

[optional] Make core_bpf loading results visible

This describes overall efforts about tightening up logging for core_bpf.

  • We need to go through various scenarious and verify if the output information is correct and sufficient. E.g. now as far as I see even with core_bpf, Collector reports that something like collector-ebpf-<kernel-version>.o is used, which is not quite true. Another annoyance is that in the debug output where Collector print out it's config, collection method mentioned as a number from enum, the actual name would be better. I'm pretty sure there are plenty of such small rough edges, it would be valuable to once run most important cases (success, failure, heuristic failure, self-checks failure) and check the output.
  • Add more information when possible. E.g. I've noticed that heuristics do not show an actual errno after probing. This leads to a system perfectly fine from support perspective report that BTF is not supported, when it was e.g. lacking sys_admin privileges or such. If not possible, mention that in the comments for the future us.
  • Verify that core_bpf details are represented in the termination log when it makes sense.
  • In the heuristics there are already hints a-la "it's not supported, try something else". We could add similar stuff if ebpf has failed, say "Hint: missing or not supported kernels issues could be avoided using core_bpf collection method".

Part of #1011

Sync up Falco fork

Update Falco fork to the latest changes and run the regular tests to verify regular probes are working.

Part of #1008

[ROX-15069] Collector support of BTF probes

The idea is to teach Collector to use "modern" Falco probe engine in a backward compatible way. To achieve that:

  • Implement new way of loading probes located inside the image. #1017
  • Implement heuristics for Collector to find out if BTF is supported. In the simplest scenario it could be a kernel version check, but ideally more elaborated feature verification is needed (in case of backported functionality). #1018
  • Implement fallback mechanism, to let Collector switch back to regular probes if it fails to load the new one. The definition of "failed to load" means simply an error from kernel, but in theory could be extended to more runtime validation that everything is in place. #1066
  • Display failed BTF load via Collector status, similar to cases when no probes could be loaded. #1077
  • (Optional for MVP) Implement new way of getting probes via searching in the container itself, since BTF aware probes are going to be embedded into the container. This has to be implemented flexible though, as there still could be cases when it's beneficial to load even BTF aware probe, and the original probes have to be loaded as usual anyway.
  • (Optional for MVP) Try to refactor the loading logic to make it more flexible, having in mind potentially delegating probe loading into the kernel to other components.

See ROX-15069 for the higher-level details.

Make Falco more flexible about including tail-called programs

For flexibility and operational purposes it's necessary to be able to run Falco modern probes with only a subset of tail-called programs loaded. This could be relatively easy achieved via customizing the build process to exclude certain BPF programs, and reduce the error to a warning, when they're not found by a name later on.

Part of #1008

Extend existing sinsp-example based tests

Falco library features two sets of tests for probes:

  • e2e tests based on sinsp-example. They allow to simulate a variety of low-level events relatively easy, don't support modern probes directly.
  • drivers test suite. More basic unit-tests following the strategy "fire a syscall and verify what the driver has captured", supports modern probes as well.

Figure out of those two cover the functionality of modern probes good enough, and if not, extend to improve that and include into the testing pipeline.

Part of #1008

Reorganize Falco fork

There was a significant amount of friction and lost time when updating the Falco fork. To combat this, it should be reorganized in such a way as to make updates easier and make it easy to remove patches that are unneeded or have been upstreamed.

The proposed procedure is:

  • make the main branch track upstream/master
  • extract stackrox patches onto module version branches (e.g. release-2.4 or similar) on top of the main branch.
  • Current master, and previously tagged module versions will remain until they are out of support (up to module version 2.4.0, collector 3.14.1, ACS 4.0)

This can be simulated on separate branches before committing to the approach, but should be done before existing patches are removed or modified.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.