Giter VIP home page Giter VIP logo

aws-for-fluent-bit's Introduction

AWS for Fluent Bit Docker Image

Welcome to AWS for Fluent Bit! Before using this Docker Image, please read this README entirely, especially the section on Consuming AWS for Fluent Bit versions 🫡

Contents

Consuming AWS for Fluent Bit versions

🔥⚠️WARNING⚠️🔥: Please read and understand the following information on how to consume AWS for Fluent Bit. Failure to do so may cause outages to your production environment. 😭💔

AWS Distro for Fluent Bit Release Tags

Our image repos contain the following types of tags, which are explained in the sections below:

AWS Distro for Fluent Bit release testing

Types of tests we run

  • Simple integration tests: Short running tests of the AWS output plugins that send log records and verify that all of them were received correctly formatted at the destination.
  • Load Tests: Test Fluent Bit AWS output plugins at various throughputs and check for log loss, the results are posted in our release notes: https://github.com/aws/aws-for-fluent-bit/releases
  • Long running stability tests: Highly parallel tests run in Amazon ECS for the AWS output plugins using the aws/firelens-datajet project. These tests simulate real Fluent Bit deployments and use cases to test for bugs that crashes.

Latest release testing bar

CVE Patch release testing bar

We do not run our long running stability tests for CVE patches. This is because the goal is to get the CVE patch out as quickly as possible, and because CVE patch releases never include Fluent Bit code changes. CVE patch releases only include base image dependency upgrades. If there is ever a CVE in the Fluent Bit code base itself, the patch for it would be considered a bug fix that might introduce instability and it would undergo the normal latest release testing.

Latest stable release testing bar

For a version to be made the latest stable, it must already have been previously released as the latest release. Thus it will have already passed the testing bar noted above for latest.

In addition, our stable release undergoes additional testing:

  • Long running stability tests: The version undergoes and passes these tests for at least 2 weeks. After the version is promoted to stable we continue to run the long running stability tests, and may roll back the stable designation if issues later surface.

Latest stable version

Our latest stable version is the most recent version that we have high confidence is stable for AWS use cases. We recommend using the stable version number in your prod deployments but not the stable tag itself; see Guidance on consuming versions

The latest stable version is marked with the tag stable/windowsservercore-stable. The version number that is currently designated as the latest stable can always be found in the AWS_FOR_FLUENT_BIT_STABLE_VERSION file in the root of this repo.

There is no guarantee that stable has no issues- stable simply has a higher testing bar than our latest releases. The stable tag can be downgraded and rolled back to the previous stable if new test results or customer bug reports surface issues. This has occurred in the past*. Consequently, we recommend locking to a specific version tag and informing your choice of version using our current stable designation.

Prior to being designated as the latest stable, a version must pass the following criteria:

  • It has been out for at least 2 weeks or is a CVE patch with no Fluent Bit changes. Stable designation is based on the Fluent Bit code in the image. A version released for CVE patches can be made stable if the underlying if the underlying Fluent Bit code is already designated as stable.
  • No bugs have been reported in Fluent Bit which we expect will have high impact for AWS customers. This means bugs in the components that are most frequently used by AWS customers, such as the AWS outputs or the tail input.
  • The version has passed our long running stability tests for at least 2 weeks. The version would have already passed our simple integration and load tests when it was first released as the latest image.

CVE scans and latest stable

Please read our CVE patching policy.

The stable designation is for the Fluent Bit code contents of the image, not CVE scan results for dependencies installed in the image. We will upgrade a CVE patch to be the latest stable if it contains no Fluent Bit code changes compared to the previous latest stable.

Guidance on consuming versions

Our release notes call out the key AWS changes in each new version.

We recommend that you only consume non-stable releases in your test/pre-prod stages. Consuming the latest tag directly is widely considered to be an anti-pattern in the software industry.

We strongly recommend that you always lock deployments to a specific immutable version tag, rather than using our stable or latest tags. We recommend you to conduct a gradual rollout of each new version consistent with your deployment rollout strategy as you would for any other code or dependency being deployed: i.e. first to non-production environments first then gradually to your production environments.

Using the stable or latest tag directly in prod has the following downsides: 🤕

  1. 😕Difficulty in determining which version was deployed: If you experience an issue, you will need to check the Fluent Bit log output to determine which specific version tag was deployed. This is because the stable and latest tags are mutable and change over time.
  2. 😐Mixed deployments: If you are in the middle of a deployment when we release an update to the stable or latest immutable tags, some of your deployment may have deployed the previous version, and the rest will deploy the new version.
  3. 🤢Difficulty in rolling back: While we take every effort to avoid releasing regressions, there is always a chance a bug might slip out. Explicitly consuming a version helps make it easier to rollback since there would be an existing deployment configuration to rollback to.

The best practice for consuming AWS for Fluent Bit is to check the AWS_FOR_FLUENT_BIT_STABLE_VERSION file and lock your prod deployments to that specific version tag. For example, if the current stable is 2.28.4, your deployment should use public.ecr.aws/aws-observability/aws-for-fluent-bit:2.28.4 not public.ecr.aws/aws-observability/aws-for-fluent-bit:stable.

AWS Distro versioning scheme FAQ

The version of the AWS for Fluent Bit image is not linked to the version of Fluent Bit which it contains.

What does the version number signify?

We use the standard major.minor.patch versioning scheme for our image, AKA Semantic Versioning. The initial release with this versioning scheme is 2.0.0. Bug fixes are released in patch version bumps. New features are released in new minor versions. We strive to only release backwards incompatible changes in new major versions.

Please read the below on CVE patches in base images and dependencies. The semantic version number applies to the Fluent Bit code and AWS Go plugin code compiled and installed in the image.

Image Versions and CVE Patches

The AWS for Fluent Bit image includes the following contents:

  • A base image (currently Amazon Linux or Windows Server Core 2019 or Windows Server Core 2022)
  • Runtime dependencies installed on top of the base image
  • Fluent Bit binary
  • Several Fluent Bit Go Plugin binaries

The process for pushing out new builds with CVE patches in the base image or installed dependencies is different for Windows vs Linux.

For Windows, every month after the B release date/"patch tuesday", we re-build and update all Windows images currently found in the windows.versions file in this repo with the newest base images from Microsoft. The Fluent Bit and go plugin binaries are copied into the newly released base windows image. Thus, the windows image tags are not immutable images; only the Fluent Bit and Go plugin binaries are immutable over time.

For Linux, each image tag is immutable. When there is a report of high or critical CVEs reported in the base amazon linux image or installed linux packages, we will work to push out a new image per our patching policy. However, we will not increment the semantic version number to simply re-build to pull in new linux dependencies. Instead, we will add a 4th version number signifying the date the image was built.

For example, a series of releases in time might look like:

  1. 2.31.12: New Patch release with changes in Fluent Bit code compared to 2.31.11. This release will have standard release notes and will have images for both linux and windows.
  2. 2.31.12-20230629: Re-build of 2.31.12 just for Linux CVEs found in the base image or installed dependencies. The Fluent Bit code contents are the same as 2.31.12. There only be linux images with this version tag, and no windows images. The latest tag for linux will be updated to point to this new image. There will be short release notes that call out it is simply a re-build for linux.
  3. 2.31.12-20230711: Another re-build of 2.31.12 for Linux CVEs on a subsequent date. This release is special as explained above in the way same as 2.31.12-20230629.
  4. 2.31.13: New Patch release with changes in Fluent Bit code compared to 2.31.12. This might be for bugs found in the Fluent Bit code. It could also be for a CVE found in the Fluent Bit code. This release has standard release notes and linux and windows images.

Why do some image tags contain 4 version numbers?

Please see the above explanation on our Linux image re-build process for CVEs found in dependencies.

Are there edge cases to the rules on breaking backwards compatibility?

One edge case for the above semantic versioning rules is changes to configuration validation. Between Fluent Bit upstream versions 1.8 and 1.9, validation of config options was fixed/improved. Previous to this distro's upgrade to Fluent Bit upstream 1.9, configurations that included certain invalid options would run without error (the invalid options were ignored). After we released Fluent Bit usptream 1.9 support, these invalid options were validated and Fluent Bit would exit with an error. See the issue discussion here.

Another edge case to the above rules are bug fixes that require removing a change. We have and will continue to occasionally remove new changes in a patch version if they were found to be buggy. We do this to unblock customers who do not depend on the recent change. Please always check our release notes for the changes in a specific version. A past example of a patch release that removed something is 2.31.4. A prior release had fixed how S3 handles the timestamps in S3 keys and the Retry_Limit configuration option. Those changes were considered to be bug fixes. However, they introduced instability so we subsequently removed them in a patch.

What about the 1.x image tags in your repositories?

The AWS for Fluent Bit image was launched in July 2019. Between July and October of 2019 we simply versioned the image based on the version of Fluent Bit that it contained. During this time we released 1.2.0, 1.2.2 and 1.3.2.

The old versioning scheme was simple and it made it clear which version of Fluent Bit our image contained. However, it had a serious problem- how could we signify that we had changed the other parts of the image? If we did not update Fluent Bit, but updated one of the plugins, how would we signify this in a new release? There was no answer- we could only release an update when Fluent Bit released a new version. We ultimately realized this was unacceptable- bug fixes or new features in our plugins should not be tied to the Fluent Bit release cadence.

Thus, we moved to the a new versioning scheme. Because customers already are relying on the 1.x tags, we have left them in our repositories. The first version with the new scheme is 2.0.0. From now on we will follow semantic versioning- but the move from 1.3.2 did not follow semantic versioning. There are no backwards incompatible changes between aws-for-fluent-bit:1.3.2 and aws-for-fluent-bit:2.0.0. Our release notes for 2.0.0 clearly explain the change.

Does this mean you are diverging from fluent/fluent-bit?

No. We continue to consume Fluent Bit from its main repository. We are not forking Fluent Bit.

Compliance and Patching

Q: Is AWS for Fluent Bit HIPAA Compliant?

Fluent Bit can be used in a HIPAA compliant matter to send logs to AWS, even if the logs contain PHI. Please see the call outs in the AWS HIPAA white paper for ECS.

Q: What is the policy for patching AWS for Fluent Bit for vulnerabilities, CVEs and image scan findings?

AWS for Fluent Bit uses ECR image scanning in its release pipeline and any scan that finds high or critical vulnerabilities will block a release: scripts/publish.sh

If you find an issue from a scan on our latest images please follow the reporting guidelines below and we will work quickly to introduce a new release. To be clear, we do not patch existing images, we just will release a new image without the issue. The team uses Amazon ECR Basic image scanning and Amazon ECR Enhanced scanning powered by AWS Inspector as the primary source of truth for whether or not the image contains a vulnerability in a dependency.

If your concern is about a vulnerability in the Fluent Bit upstream (github.com/fluent/fluent-bit open source code), please let us know as well. However, fixing upstream issues requires additional work and time because we must work closely with upstream maintainers to commit a fix and cut an upstream release, and then we can cut an AWS for Fluent Bit release.

Q: How do I report security disclosures?

If you think you’ve found a potentially sensitive security issue, please do not post it in the Issues on GitHub. Instead, please follow the instructions here or email AWS security directly at [email protected].

Debugging Guide

Please read the debugging.md

Use Case Guide

A set of tutorials on use cases that Fluent Bit can solve.

Public Images

Linux Images

Each release updates the latest tag and adds a tag for the version of the image. The stable tag is also available which marks a release as the latest stable version.

Windows Images

For Windows images, we update the windowsservercore-latest tag and add a tag as <VERSION>-windowsservercore. The stable tag is available as windowsservercore-stable. We update all the supported versions each month when Microsoft releases the latest security patches for Windows.

Note: Deploying latest/windowsservercore-latest to prod without going through a test stage first is not recommended.

arm64 and amd64 images

AWS for Fluent Bit currently distributes container images for arm64 and amd64 CPU architectures. Our images all use mutli-archictecture tags. For example, this means that if you pull the latest tag on a Graviton instance, you would get the arm64 image build.

For Windows, we release images only for amd64 CPU architecture of the following Windows releases-

  • Windows Server 2019
  • Windows Server 2022

Using the init tag

The init tags indicate that an image contains init process and supports multi-config. Init tag is used in addition to our other tags, e.g. aws-for-fluent-bit:init-latest means this is a latest released image supports multi-config. For more information about the usage of multi-config please see our use case guide and FireLens example.

Note: Windows images with init tag are not available at the moment.

Using SSM to find available versions and aws regions

As of 2.0.0, there are SSM Public Parameters which allow you to see available versions. These parameters are available in every region that the image is available in. Any AWS account can query these parameters.

To see a list of available version tags, run the following command:

aws ssm get-parameters-by-path --path /aws/service/aws-for-fluent-bit/ --query 'Parameters[*].Name'

Example output:

[
    "/aws/service/aws-for-fluent-bit/latest"
    "/aws/service/aws-for-fluent-bit/windowsservercore-latest"
    "/aws/service/aws-for-fluent-bit/2.0.0"
    "/aws/service/aws-for-fluent-bit/2.0.0-windowsservercore"
]

If there is no output, it means the aws for fluent bit image is not available in current region.

To see the ECR repository ID for a given image tag, run the following:

$ aws ssm get-parameter --name /aws/service/aws-for-fluent-bit/2.0.0
{
    "Parameter": {
        "Name": "/aws/service/aws-for-fluent-bit/2.0.0",
        "Type": "String",
        "Value": "906394416424.dkr.ecr.us-east-1.amazonaws.com/aws-for-fluent-bit:2.0.0",
        "Version": 1,
        "LastModifiedDate": 1539908129.759,
        "ARN": "arn:aws:ssm:us-west-2::parameter/aws/service/aws-for-fluent-bit/2.0.0"
    }
}

Using SSM Parameters in CloudFormation Templates

You can use these SSM Parameters as parameters in your CloudFormation templates.

Parameters:
  FireLensImage:
    Description: Fluent Bit image for the FireLens Container
    Type: AWS::SSM::Parameter::Value<String>
    Default: /aws/service/aws-for-fluent-bit/latest

Using image tags

You should lock your deployments to a specific version tag. We guarantee that these tags will be immutable- once they are released the will not change. Windows images will be updated each month to include the latest security patches in the base layers but the contents of the image will not change in a tag.

Amazon ECR Public Gallery

aws-for-fluent-bit

Our images are available in Amazon ECR Public Gallery. We recommend our customers to download images from this public repo. You can get images with different tags by following command:

docker pull public.ecr.aws/aws-observability/aws-for-fluent-bit:<tag>

For example, you can pull the image with latest version by:

docker pull public.ecr.aws/aws-observability/aws-for-fluent-bit:latest

If you see errors for image pull limits, or get the following error:

Error response from daemon: pull access denied for public.ecr.aws/amazonlinux/amazonlinux, repository does not exist or may require 'docker login': denied: Your authorization token has expired. Reauthenticate and try again.

Then try log into public ECR with your AWS credentials:

aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws

You can check the Amazon ECR Public official doc for more details.

Docker Hub

amazon/aws-for-fluent-bit

Amazon ECR

We also provide images in Amazon ECR for high availability. These images are available in almost every AWS region, included AWS Gov Cloud.

The official way to find the ECR image URIs for your region is to use the SSM Parameters. In your region, run the following command:

aws ssm get-parameters-by-path --path /aws/service/aws-for-fluent-bit/

Using the debug images

Deploying AWS for Fluent Bit debug images can help the AWS team troubleshoot an issue. If you experience a bug, especially a crash/SIGSEGV issue, then please consider deploying the debug version of the image. After a crash, the debug image can print out a stacktrace and upload a core dump to S3. See our debugging guide for more info on using debug images.

For debug images, we update the debug-latest tag and add a tag as debug-<Version>.

Plugins

We currently bundle the following projects in this image:

Using the AWS Plugins outside of a container

You can use the AWS Fluent Bit plugins with td-agent-bit.

We provide a tutorial on using SSM to configure instances with td-agent-bit and the plugins.

Running aws-for-fluent-bit Windows containers

You can run aws-for-fluent-bit Windows containers using the image tags as specified under Windows Images section. These are distributed as multi-arch images with the manifests for the supported Windows releases as specified above.

For more details about running Fluent Bit Windows containers in Amazon EKS, please visit our blog post.

For more details about running Fluent Bit Windows containers in Amazon ECS, please visit our blog post. For running Fluent Bit as a Amazon ECS Service using daemon scheduling strategy, please visit our Amazon ECS tutorial. For more details about using the AWS provided default configurations for Amazon ECS, please visit our documentation.

Note: There is a known issue with networking failure when running Fluent Bit in Windows containers on default container network. Check out the guidance in our debugging guide for a workaround to this issue.

Development

Local testing

Use make release to build the image.

To run the integration tests, run make integ-dev. The make integ-dev command will run the integration tests for all of our plugins- kinesis streams, kinesis firehose, and cloudwatch.

The integ tests require the following env vars to be set:

  • CW_INTEG_VALIDATOR_IMAGE: Build the integ/validate_cloudwatch/ folder with docker build and set the resulting image as the value of this env var.
  • S3_INTEG_VALIDATOR_IMAGE: Build the integ/s3/ folder with docker build and set the resulting image as the value of this env var.

To run integration tests separately, execute make integ-cloudwatch or make integ-kinesis or make integ-firehose.

Documentation on GitHub steps for releases.

Developing Features in the AWS Plugins

You can build a version of the image with code in your GitHub fork. To do so, you must need to set the following environment variables. Otherwise, you will see an error message like the following one: fatal: repository '/kinesis-streams' or '/kinesis-firehose' or '/cloudwatch' does not exist.

Set the following environment variables for CloudWatch:

export CLOUDWATCH_PLUGIN_CLONE_URL="Your GitHub fork clone URL"
export CLOUDWATCH_PLUGIN_BRANCH="Your branch on your fork"

Or for Kinesis Streams:

export KINESIS_PLUGIN_CLONE_URL="Your GitHub fork clone URL"
export KINESIS_PLUGIN_BRANCH="Your branch on your fork"

Or for Kinesis Firehose:

export FIREHOSE_PLUGIN_CLONE_URL="Your GitHub fork clone URL"
export FIREHOSE_PLUGIN_BRANCH="Your branch on your fork"

Then run make cloudwatch-dev or make kinesis-dev or make firehose-dev to build the image with your changes.

To run the integration tests on your code, execute make integ-cloudwatch-dev or make integ-kinesis-dev or make integ-firehose-dev.

Fluent Bit Examples

Check out Fluent Bit examples from our amazon-ecs-firelens-examples repo.

License

This project is licensed under the Apache-2.0 License.

aws-for-fluent-bit's People

Contributors

carmenapuccio avatar claych avatar davidnewhall avatar dependabot[bot] avatar drewzhang13 avatar galaoaoa avatar hankwallace avatar hencrice avatar hossain-rayhan avatar jaredcnance avatar jeffunderhill avatar joebowbeer avatar johnjameswhitman avatar konoui avatar kzys avatar lesandeep avatar lubingfeng avatar matthewfala avatar meghnapr avatar meghnaprabhu avatar pettitwesley avatar rawahars avatar robinverduijn avatar rs-garrick avatar shelbyz avatar somanyhs avatar sonofachamp avatar swapneils avatar zhonghui12 avatar zwj102030 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-for-fluent-bit's Issues

Update to fluentbit 1.4

When can we expect this to be updated to fluentbit 1.4?
I need the new tag rewrite plugin

Container receives SIGSEGV

Hi, for the past few weeks we noticed our services on ECS were going down for no apparent reason. After some investigation we discovered that the latest image of aws-for-fluent-bit is receiving a SIGSEGV signal indicating segmentation fault, see screenshot.

Screenshot 2020-07-30 at 09 35 23

I don't have any steps to reproduce but it might be network related, if we change the router to use CloudWatch instead of Datadog it doesn't fail (at least we couldn't observe any failure since we deployed 3 days ago). With this hypothesis I'm assuming that there are more network errors when sending data to Datadog than to CloudWatch.

We decided to rollback to 2.3.1 in production and it works normally. I'm not sure if this is of any use for debugging purposes but let me know which info you would need to fix the issue. Thanks!

Fluent bit Cloudwatch plugin use instance role

Hi, I try to run Fluent bit on an Amazon Linux 2, everything works fine except that the Cloudwatch plugins seems unable to authenticate using the Instance Profile.

[2020/11/13 15:22:40] [ info] [engine] started (pid=7706)
[2020/11/13 15:22:40] [ info] [storage] version=1.0.6, initializing...
[2020/11/13 15:22:40] [ info] [storage] in-memory
[2020/11/13 15:22:40] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/11/13 15:22:40] [ warn] [aws_credentials] Failed to initialized profile provider: $HOME not set and AWS_SHARED_CREDENTIALS_FILE not set.
[2020/11/13 15:22:40] [ info] [sp] stream processor started

Any idea how I can tell the Cloudwatch plugin to use the IAM instance Profile to authenticate towards AWS?

Memory leak for idle sidecar

Hey aws team,

we are using EC2 awsfirelens in our ECS tasks (non-fargate) with fluent-bit (aws-for-fluent-bit docker image) sidecar (one fluent-bit container per task) in our dev environment. We are sending logs to datadog using your awsfirelens datadog integration easily. Before moving to prod, we realized that the fluent-bit sidecar containers from some tasks at a very low rate (practically they are on idle status as we didn't use them for the last days) are non-stopping memory growth. Our current configuration is not really different from your github examples, but let me show you:

My task definition with the idle container (non-used) called my-service and the sidecar container called my-service-logs-router:

TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: my-service-family
      TaskRoleArn: !Ref TaskRole
      ExecutionRoleArn: !Ref TaskExecutionRole
      ContainerDefinitions:
        - Name: my-service-logs-router
          Essential: true
          Image: 906394416424.dkr.ecr.us-west-1.amazonaws.com/aws-for-fluent-bit:2.3.1
          Cpu: 16
          Memory: 64
          MemoryReservation: 48
          FirelensConfiguration:
            Type: fluentbit
            Options:
              enable-ecs-log-metadata: true
              config-file-type: s3
              config-file-value: arn:aws:s3:::my-s3-bucket/config.conf
          PortMappings:
            - ContainerPort: 2020
          DockerLabels:
            PROMETHEUS_EXPORTER_PORT: 2020
            PROMETHEUS_EXPORTER_PATH: /api/v1/metrics/prometheus
          Environment:
            - Name: FLB_LOG_LEVEL
              Value: debug
        - Name: my-service
          Essential: true
          Image: !Sub my-service-docker-image
          Cpu: !Ref ContainerCpu
          Memory: !Ref ContainerHardMemory
          MemoryReservation: !Ref ContainerSoftMemory
          PortMappings:
            - ContainerPort: 8080
          LogConfiguration:
            LogDriver: awsfirelens
            Options:
              Name: datadog
              Host: http-intake.logs.datadoghq.com
              dd_service: my-service
              dd_source: go
              dd_message_key: log
              dd_tags: !Sub "env:${Environment}"
              TLS: on
              provider: ecs
            SecretOptions:
              - Name: apikey
                ValueFrom: datadog-ssm-secret
                

Our s3 config file (we are just copying your graceful settings and adding prometheus server as we wanted to dig into this issue but the issue also happens without turning on HTTP server metrics):

[SERVICE]
    # Flush
    # =====
    # Set an interval of seconds before to flush records to a destination
    Flush 1

    # Grace
    # =====
    # Set the grace period in seconds
    Grace 30

    # HTTP Server
    # Enable/Disable the built-in HTTP Server for metrics
    # ===========
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_PORT    2020

For instance, these are the prometheus metrics from the fluent-bit sidecar logging container for the last 22 hours from a service that is totally idle (no-one is using it and it is not producing any log at all):
image

And here you can look at the fluent bit input / output bytes and error rate (they show no errors from our output side)

image

These are the showed service logs from the fluent-bit sidecar container (debug level):

tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
AWS for Fluent Bit Container Image Version 2.3.1
tput: No value for $TERM and no -T specified
Fluent Bit v1.4.2
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/07/21 14:42:41] [ info] Configuration:
[2020/07/21 14:42:41] [ info]  flush time     | 1.000000 seconds
[2020/07/21 14:42:41] [ info]  grace          | 30 seconds
[2020/07/21 14:42:41] [ info]  daemon         | 0
[2020/07/21 14:42:41] [ info] ___________
[2020/07/21 14:42:41] [ info]  inputs:
[2020/07/21 14:42:41] [ info]      forward
[2020/07/21 14:42:41] [ info]      forward
[2020/07/21 14:42:41] [ info]      tcp
[2020/07/21 14:42:41] [ info] ___________
[2020/07/21 14:42:41] [ info]  filters:
[2020/07/21 14:42:41] [ info]      record_modifier.0
[2020/07/21 14:42:41] [ info] ___________
[2020/07/21 14:42:41] [ info]  outputs:
[2020/07/21 14:42:41] [ info]      null.0
[2020/07/21 14:42:41] [ info]      datadog.1
[2020/07/21 14:42:41] [ info] ___________
[2020/07/21 14:42:41] [ info]  collectors:
[2020/07/21 14:42:41] [debug] [storage] [cio stream] new stream registered: forward.0
[2020/07/21 14:42:41] [debug] [storage] [cio stream] new stream registered: forward.1
[2020/07/21 14:42:41] [debug] [storage] [cio stream] new stream registered: tcp.2
[2020/07/21 14:42:41] [ info] [storage] version=1.0.3, initializing...
[2020/07/21 14:42:41] [ info] [storage] in-memory
[2020/07/21 14:42:41] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/07/21 14:42:41] [ info] [engine] started (pid=1)
[2020/07/21 14:42:41] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/07/21 14:42:41] [ info] [input:forward:forward.0] listening on unix:///var/run/fluent.sock
[2020/07/21 14:42:41] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2020/07/21 14:42:41] [ info] [input:forward:forward.1] listening on 0.0.0.0:24224
[2020/07/21 14:42:41] [ info] [input:tcp:tcp.2] listening on 127.0.0.1:8877
[2020/07/21 14:42:41] [debug] [output:datadog:datadog.1] scheme: https://
[2020/07/21 14:42:41] [debug] [output:datadog:datadog.1] api_key: xxxxxxxxxxxxxxxxxxxxxxxxx
[2020/07/21 14:42:41] [debug] [output:datadog:datadog.1] uri: /v1/input/xxxxxxxxxxxxxxxxxxxxxxxxx
[2020/07/21 14:42:41] [debug] [output:datadog:datadog.1] host: http-intake.logs.datadoghq.com
[2020/07/21 14:42:41] [debug] [output:datadog:datadog.1] port: 443
[2020/07/21 14:42:41] [debug] [output:datadog:datadog.1] json_date_key: timestamp
[2020/07/21 14:42:41] [debug] [output:datadog:datadog.1] compress_gzip: 0
[2020/07/21 14:42:41] [debug] [router] match rule tcp.2:null.0
[2020/07/21 14:42:41] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2020/07/21 14:42:41] [ info] [sp] stream processor started
[2020/07/21 14:43:00] [debug] [task] created task=0x7f66d722c400 id=0 OK
[2020/07/21 14:43:01] [ info] [output:datadog:datadog.1] https://http-intake.logs.datadoghq.com, port=443, HTTP status=200 payload={}
[2020/07/21 14:43:01] [debug] [task] destroy task=0x7f66d722c400 (task_id=0)
[2020/07/21 14:43:15] [debug] [task] created task=0x7f66d722c400 id=0 OK
[2020/07/21 14:43:16] [ info] [output:datadog:datadog.1] https://http-intake.logs.datadoghq.com, port=443, HTTP status=200 payload={}
[2020/07/21 14:43:16] [debug] [task] destroy task=0x7f66d722c400 (task_id=0)
[2020/07/21 14:43:30] [debug] [task] created task=0x7f66d722c400 id=0 OK
[2020/07/21 14:43:31] [ info] [output:datadog:datadog.1] https://http-intake.logs.datadoghq.com, port=443, HTTP status=200 payload={}
[2020/07/21 14:43:31] [debug] [task] destroy task=0x7f66d722c400 (task_id=0)
[2020/07/21 14:43:45] [debug] [task] created task=0x7f66d722c400 id=0 OK
[2020/07/21 14:43:46] [ info] [output:datadog:datadog.1] https://http-intake.logs.datadoghq.com, port=443, HTTP status=200 payload={}
[2020/07/21 14:43:46] [debug] [task] destroy task=0x7f66d722c400 (task_id=0)
[2020/07/21 14:44:00] [debug] [task] created task=0x7f66d722c400 id=0 OK

Do you have any idea where the issue comes from? Thanks!

Unexpected behaviour when using config with many [OUTPUT] sections

Howdy,
Running the latest image tag, d043d7e505fa, I get unexpected behaviour with my config file which has 45 [OUTPUT] sections.

I am using the 'cloudwatch' output plugin to ship logs to separate log groups in AWS CloudWatch.
For some logs, the events are shipped to the wrong log group as well as the correct one. This snippet shows two examples:

time="2020-10-23T12:38:18Z" level=info msg="[cloudwatch 9] Created log stream fluentbit-kube.var.log.containers.camel-schedule-watcher-1603456680-zzf4n_camel_snapshotter-a1e5b58882df724794ba7097b68757cf3c759fd995416a3f60391918af207289.log in group /mycompany/camel/bearpig-internal-eks/application/external-dns"
time="2020-10-23T12:38:18Z" level=info msg="[cloudwatch 42] Created log stream fluentbit-kube.var.log.containers.camel-schedule-watcher-1603456680-zzf4n_camel_snapshotter-a1e5b58882df724794ba7097b68757cf3c759fd995416a3f60391918af207289.log in group /mycompany/camel/bearpig-internal-eks/application/camel-schedule-watcher"
time="2020-10-23T12:38:18Z" level=info msg="[cloudwatch 5] Created log stream fluentbit-kube.var.log.containers.camel-epg-watcher-1603456680-dmp8p_camel_snapshotter-7b7145575dad03dcb1e1ef0ed0e35cea30413b0d55883cd990a3fc00c0c4899f.log in group /mycompany/camel/bearpig-internal-eks/application/argocd-redis"
time="2020-10-23T12:38:18Z" level=info msg="[cloudwatch 38] Created log stream fluentbit-kube.var.log.containers.camel-epg-watcher-1603456680-dmp8p_camel_snapshotter-7b7145575dad03dcb1e1ef0ed0e35cea30413b0d55883cd990a3fc00c0c4899f.log in group /mycompany/camel/bearpig-internal-eks/application/camel-epg-watcher"

(The log event appears to trigger an additional output for which the tag does not match)
I only see this behaviour when the second line shows [cloudwatch <n>] where n > 32.
Is there a maximum number of [OUTPUT] sections of 32?

Feature Request: Human readable container name from ECS metadata

When ECS metadata is turned on, Firelens attaches some extra ECS metadata to each log line. However this metadata does not include the human readable container name, only an autogenerated container name:

{
    "container_id": "957b36352863d2af06fa223f73e8e2f26d900c3f12ba81bc4df58c9e390a9a4d",
    "container_name": "/ecs-chattersubscriptionsubscriptiontaskdefinition84AF3B49-39-app-d0c89395d3efaac05200",
    "ec2_instance_id": "i-0e8c1703f57233826",
    "ecs_cluster": "chatter-base-ClusterEB0386A7-A8Q2H6M4RI1I",
    "ecs_task_arn": "arn:aws:ecs:us-east-1:209640446841:task/chatter-base-ClusterEB0386A7-A8Q2H6M4RI1I/56a8667e7a454b0a8c7a39d5723de11e",
    "ecs_task_definition": "chattersubscriptionsubscriptiontaskdefinition84AF3B49:39",
    "log": "REDACTED",
    "source": "stderr"
}

Ideally there would be a human readable container name taken from the ECS task definition as well, for easier querying

sending stderr and stdout streams to Cloudwatch and Elastic Search using fluent bit fails

Trying to send stdout stream to cloudwatch(CW) and stderr stream to ElasticSearch(ES) using custom flient bit configuration. which is failing with an error below. Strange thing is, when trying to send log stream to one destination( [OUTPUT] section in fluent-bit.conf ) it works, but as soon as adding both the output section with two different stream in configuration it fails.

Image: amazon/aws-for-fluent-bit:latest

Please find my configuration below:

fluent-bit.conf
[SERVICE]
Parsers_File /fluent-bit/etc/parsers.conf
Streams_File /fluent-bit/etc/stream_processing.conf
Log_Level debug

[FILTER]
Name parser
Match *
Key_Name log
Parser json
Reserve_Data True

[OUTPUT]
Name cloudwatch
Match source.stderr
region eu-west-1
log_group_name fluent-bit-demo
log_stream_prefix from-fluent-bit-
auto_create_group true

[OUTPUT]
Name es
Match source.stdout
Host some-example-domain.eu-west-1.es.amazonaws.com
http_user username
http_passwd pass
Port 443
Index bit
Type eslog
Logstash_Format On
tls on
tls.verify off

stream_processing.conf
[STREAM_TASK]
Name error_logs
Exec CREATE STREAM debug WITH (tag='source.stderr') AS SELECT * from TAG:'*' WHERE source = 'error';

[STREAM_TASK]
Name info_logs
Exec CREATE STREAM info WITH (tag='source.stdout') AS SELECT * from TAG:'*' WHERE source = 'info';

cat log/parsers.conf
[PARSER]
Name json
Format json

Error:

timestamp message

| 1593068528000 | [2020/06/25 07:02:07] [debug] [task] created task=0x7fc78442c680 id=0 without routes, dropping. |
| 1593068528000 | [2020/06/25 07:02:07] [debug] [task] destroy task=0x7fc78442c680 (task_id=0) |
| 1593068528000 | [2020/06/25 07:02:07] [debug] [task] created task=0x7fc78442c680 id=0 OK |
| 1593068528134 | [2020/06/25 07:02:08] [debug] [output:es:es.0] HTTP Status=200 URI=/_bulk |
| 1593068528134 | [2020/06/25 07:02:08] [debug] [retry] new retry created for task_id=0 attemps=1 |
| 1593068528135 | [2020/06/25 07:02:08] [ warn] [engine] failed to flush chunk '1-1593068525.500047490.flb', retry in 7 seconds: task_id=0, input=info > output=es.0 |
| 1593068535030 | [2020/06/25 07:02:15] [debug] [output:es:es.0] HTTP Status=200 URI=/_bulk |
| 1593068535030 | [2020/06/25 07:02:15] [debug] [retry] re-using retry for task_id=0 attemps=2

labels dropped after connection blip

We see that pod labels are being dropped after a pod faced or recovered from a temporary network interruption.
For some reason only the labels are not forwarded any more, the log message is processed as expected.

Reproducing steps:

  1. temporary network interruption - the service retries and recovers
[2020/07/01 15:27:51] [ warn] net_tcp_fd_connect: getaddrinfo(host='http-xyz.splunkcloud.com'): Name or service not known                                                            
[2020/07/01 15:27:51] [ warn] [engine] failed to flush chunk '1-1593617268.658873710.flb', retry in 11 seconds: task_id=1, input=tail.0 > output=xyz-splunk                                         │
[2020/07/01 15:27:55] [ warn] net_tcp_fd_connect: getaddrinfo(host='kubernetes.default.svc'): Name or service not known                                                                             
[2020/07/01 15:27:55] [error] [filter:kubernetes:kubernetes.0] upstream connection error                                                                                                            
[2020/07/01 15:28:02] [ info] [engine] flush chunk '1-1593617268.658873710.flb' succeeded at retry 1: task_id=1, input=tail.0 > output=xyz-splunk
  1. from now on the log messages do not contain labels any more
  2. restarting the pod resolves the issue

Failed to source credential on Amazon EKS IAM Roles for Service Account

Upon upgrade to aws-for-fluent 2.8 (fluent bit 1.6)
Following error messages keep appearing and it shows the pod or fluent bit keep sourcing AWS credential from the underlying EKS worker node (EC2 instance) rather than the annotated EKS IAM Roles for Service Account (IRSA).

[2020/10/16 09:52:24] [error] [output:es:es.3] HTTP status=403 URI=/_bulk, response: {"error":{"root_cause":[{"type":"security_exception","reason":"no permissions for [indices:data/write/bulk] and User [name=arn:aws:iam::XXX873347XXX:role/eksctl-cluster-1-nodegroup-ng-al1-NodeInstanceRole-7GZZR0O6HRQS, backend_roles=[arn:aws:iam::XXX873347XXX:role/eksctl-cluster-1-nodegroup-ng-al1-NodeInstanceRole-7GZZR0O6HRQS], requestedTenant=null]"}],"type":"security_exception","reason":"no permissions for [indices:data/write/bulk] and User [name=arn:aws:iam::XXX873347XXX:role/eksctl-cluster-1-nodegroup-ng-al1-NodeInstanceRole-7GZZR0O6HRQS, backend_roles=[arn:aws:iam::XXX873347XXX:role/eksctl-cluster-1-nodegroup-ng-al1-NodeInstanceRole-7GZZR0O6HRQS], requestedTenant=null]"},"status":403}

The config of fluent bit is here:

[OUTPUT]
    Name            es
    Match           kube.*
    Host            amazon-es-domain.ap-southeast-1.es.amazonaws.com
    Port            443
    TLS             On
    Logstash_Format On
    Logstash_Prefix eks-cluster-1
    Retry_Limit     10
    AWS_Auth        On
    AWS_Region      ap-southeast-1
    Generate_ID     On
    Replace_Dots    On

Windows support

I'd like to be able to deploy a daemon set that uses the same image name for both Linux and Windows.

Multiline log guidance

We have the following configuration

{
      "essential": true,
      "name": "log_router",
      "firelensConfiguration": {
        "type": "fluentbit",
        "options": {
          "enable-ecs-log-metadata": "true"
        }
      }
      "memoryReservation": 50,
      "image": "906394416424.dkr.ecr.${AWS_REGION}.amazonaws.com/aws-for-fluent-bit:latest"
    },

We want to have multiline logs for stack trace etc.
How should I configure fluentbit

arm64 build needed

Now that ECR supports multi-arch images, and EC2 instance types using Graviton2 processors have reached GA, it's time for us to build dual x86-64/arm64 images using multi-arch builds.

ECS Metadata

Hi there,
From what I understand, Firelens automatically adds some ECS metadata to the logs.
I'm not using Firelens (we use Shippable and unfortunately it's incompatible), but have Fluent Bit set up and I have ECS Container Metadata enabled. My question is, how can we manually add ECS metadata? Any guidance would be much appreciated.
Thank you.

EDIT:
I know there's a record_modifier filter in your example configs, but the values would be static. Is there a way to have variables using the ECS metadata?

    Name record_modifier
    Match *
    Record ec2_instance_id i-01dce3798d7c17a58
    Record ecs_cluster furrlens
    Record ecs_task_arn arn:aws:ecs:ap-south-1:144718711470:task/737d73bf-8c6e-44f1-aa86-7b3ae3922011
    Record ecs_task_definition firelens-example-twitch-session:5```

Default "log" key is not replaced by custom key.

I'm trying to change the default log key of fluent-bit container, but seems i'm not able to get it.

Docker file :
FROM amazon/aws-for-fluent-bit:latest ADD fluent-bit.conf / ADD fluent-bit.conf /fluent-bit/etc/ ADD parsers.conf / ADD parsers.conf /fluent-bit/etc/ USER root

FluentBit conf (Used Modigy Plugin to change key)
[SERVICE]
Parsers_File /parsers.conf
Log_Level debug

[INPUT]
Name forward
Listen 0.0.0.0
Port 24224

[INPUT]
Name tail
Path /opt/akoshalogs/*.log

[FILTER]
Name modify
Match *
Rename log message

[OUTPUT]
Name stdout
Match *

Parsers conf file
[PARSER]
Name json
Format json

How I'm running --
docker run -it fluentbit bash

bash-4.2# mkdir /opt/akoshalogs/
bash-4.2# echo  "{\"key\": \"$(date)\"}" >> /opt/akoshalogs/filename.log
bash-4.3#  cd fluent-bit/bin/
bash-4.3# ./fluent-bit \
  -i tail \
  -p path=/opt/akoshalogs/filename.log \
  -t testcase \
  -o stdout

Output Still showing -[0] testcase: [1593853555.324835600, {"log"=>"{"key": "Sat Jul 4 09:05:42 UTC 2020"}"}]
@PettitWesley Could you help on this.

Creating different log groups pattern for EKS pods

Hi,

I would like to create different log groups based on tags in EKS pods.
like

 log_group_name /eks/luminary/$(kubernetes['namespace_name'])/$(kubernetes['labels']['app'])-$(kubernetes['labels']['version'])
[OUTPUT]
        Name cloudwatch
        Match   **
        region us-east-2
        log_group_name /eks/clustername/$(kubernetes['namespace_name'])/$(kubernetes['labels']['app'])-$(kubernetes['labels']['version'])
        #log_group_name /eks/clustername/$(kubernetes['namespace_name'])/$(kubernetes['container_name'])
        log_stream_prefix fluentbit-
        auto_create_group true

The issue is here, k8s core pods does not have these labels.
So is it possible to create second [OUTPUT] in configmap but just to match internal k8s namespaces like kube-system, kube-node-lease, then I can configure different log_group_name pattern for them
and I should be able to exclude these namespaces in the other output.

Thanks

Feature request: tiny up docker image

Currently image is around 553Mb which makes difficult to use as for example pod sidecar. Can we make it a reasonable or size without all unnecessary software installed. It can be multistage build for example

kubernetes fields in logs

@PettitWesley

I deployed aws-for-fluent-bit:2.8.0 and logs didn't come with kubernetes fields. I only get log, stream and time fields

when I deployed aws-for-fluent-bit:2.3.1 the logs come with kuberntes fields as container_name, docker_id, namespace_name and pod_name

how can I get in the latest version kubernetes fields?

Thanks

InvalidParameterException: Log event too large

Hi,

we are using this tool in K8s as Daemonset and get the following error sometimes (it depends on which node the pod is running and which other pods are running on this node):

time="2020-08-26T08:24:17Z" level=error msg="[cloudwatch 0] InvalidParameterException: Log event too large: 635616 bytes exceeds limit of 262144\n\tstatus code: 400, request id: 9d79f780-3842-4438-a73d-dc6cc54864c8\n" [2020/08/26 08:24:17] [ warn] [engine] chunk '1-1598430244.412167060.flb' cannot be retried: task_id=2, input=tail .0 > output=cloudwatch.0

So I know that there is this limit in AWS CW: https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CalculatePutEventsEntrySize.html

Is it possible to handle this issue (e.g. truncation, ...)?

Best regards,
Albert

2.7.0 update overwrote 2.6.1 as well?

Hey!

Due to a legacy configuration we've been running aws-for-fluent-bit in Kubernetes using the :latest tag. This morning we've started getting issues.

I was already working on a configuration that would pin the version of aws-for-fluent-bit to 2.6.1. This configuration was working as intended yesterday, however, after re-deploying the exact same configuration today, I've started seeing the exact same problems that occur with the latest tag.

I've noticed that on hub.docker.com, both the 2.6.1 and the 2.7.0 images have been updated 9 hours ago (right around when our problems began).

It also looks like the 2.6.1 image includes fluentbit 1.5.6, even though the changelog mentions it should include 1.5.2. 1.5.6 is the version that should be included in 2.7.0.

Could it be that the 2.7.0 image has also been tagged with 2.6.1?

Edit: I just noticed that the digest for the 2.6.1 and 2.7.0 image are also the same.

Incorrect image path in SSM parameters in AWS China regions

While getting fluentbit images from SSM parameter with following command:

aws ssm get-parameter --name /aws/service/aws-for-fluent-bit/2.0.0 --region cn-north-1

the domain in the image path is amazonaws.com, which is incorrect and should be amazonaws.com.cn, following is the paths for different versions:

128054284489.dkr.ecr.cn-north-1.amazonaws.com/aws-for-fluent-bit:2.0.0
128054284489.dkr.ecr.cn-north-1.amazonaws.com/aws-for-fluent-bit:2.1.0
128054284489.dkr.ecr.cn-north-1.amazonaws.com/aws-for-fluent-bit:2.1.1
128054284489.dkr.ecr.cn-northwest-1.amazonaws.com/aws-for-fluent-bit:2.0.0
128054284489.dkr.ecr.cn-northwest-1.amazonaws.com/aws-for-fluent-bit:2.1.0
128054284489.dkr.ecr.cn-northwest-1.amazonaws.com/aws-for-fluent-bit:2.1.1

not all versions return the incorrect path, paths for 2.2.0 and latest tag is correct:

128054284489.dkr.ecr.cn-north-1.amazonaws.com.cn/aws-for-fluent-bit:2.2.0
128054284489.dkr.ecr.cn-north-1.amazonaws.com.cn/aws-for-fluent-bit:latest
128054284489.dkr.ecr.cn-northwest-1.amazonaws.com.cn/aws-for-fluent-bit:2.2.0
128054284489.dkr.ecr.cn-northwest-1.amazonaws.com.cn/aws-for-fluent-bit:latest

Exit Code 137 for newest release

We are triggering fargate tasks with a step function. For the newest fluentbit container this fails because the container always exits with exit code 137 when the actual container in the task finishs.
For the previous version 1.3.2 this is not a problem.

Request for specify or any method to change Fluent-bit counter and flowcounter plugin output

Hi all,

I've used https://docs.fluentbit.io/manual/pipeline/outputs/counter and https://docs.fluentbit.io/manual/pipeline/outputs/flowcounter to get data statistics as format below
// counter output
1603087116.245969,17 (total = 17)
// flow counter output
[out_flowcounter] [1603087136, {"counts":50, "bytes":66015, "counts/minute":0, "bytes/minute":1100 }]

It does not compatible with Cloudwatch metric filter pattern that it uses space delimiter because I want to extract only 17 as single values or single columns. Are you able to format it to only json format or add space bar like apache logs

Please support log_retention_days

Configuration parameters to set log retention in the config, not manually in the console, would be an extremely useful feature.

FluentD:
log_retention_days 3

FluentBit:
[error] [config] cloudwatch_logs: unknown configuration property 'log_retention_days'. The following properties are allowed: region, log_group_name, log_stream_name, log_stream_prefix, log_key, extra_user_agent, log_format, role_arn, auto_create_group, endpoint, sts_endpoint, metric_namespace, and metric_dimensions.

Missing timestamp

Hi there,
I'm new to Fluent Bit and logging, so I've been following this tutorial: https://aws.amazon.com/blogs/opensource/centralized-container-logging-fluent-bit/

I have the fluentd aggregator daemon running in my ECS cluster and it's collecting logs from an application container.

I've managed to send my logs to ES and Kibana, but I can't seem to figure out how to get a Timestamp field in there. Here are my configs:

Dockerfile

FROM amazon/aws-for-fluent-bit:latest
ADD fluent-bit.conf /fluent-bit/etc/
ADD parsers.conf /fluent-bit/etc/

fluent-bit.conf

[SERVICE]
    Parsers_File parsers.conf

[INPUT]
    Name forward
    unix_path /var/run/fluent.sock

[FILTER]
    Name parser
    Match **
    Parser docker
    Key_Name log

[OUTPUT]
    Name firehose
    Match **
    delivery_stream my-test-stream
    region ca-central-1
    time_key  time
    time_key_format %Y-%m-%dT%H:%M:%S%z

parsers.conf

[PARSER]
    Name docker
    Format json
    Time_Key time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep On

My log still ends up as one string with no timestamp field. I'm not quite sure what I've misconfigured. I've also tried the nginx parser as I have nginx logs, but that doesn't work either.

    [PARSER]
        Name   nginx
        Format regex
        Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

Any idea what might be missing?
Thanks

Add openssl to docker image for 1.7

Fluent Bit 1.7 has been refactored to use openssl instead of mbedtls. It requires openssl1.1.x

On amazon linux it can be installed with:

yum install openssl11-devel

AWS ECS Entrypoint container support required - ECS Task failing

We need an entrypoint in the ECS Task Definition for this container. But when we try to add it fails because we don't know what "path" and "command" that needs to go into the Entrypoint. Because there is no entrypoint our security solution with Twistlock Defender is not working, because Twistlock Defender works as a Sidecar and puts in an entry in the Entrypoint. For other applications it is working, but for this specific container "amazon/aws-for-fluent-bit:latest" it is not working. We don't know what is a valid "path" and "command" that needs to go in there.

"entryPoint": [
"/twistlock/fargate_defender.sh",
"fargate",
"entrypoint",
"/entrypoint.sh",
"run"
],

Does CloudWatch plugin support proxy?

Bug Description
I'm trying to send a log from EKS to CloudWatch using Amazon's CloudWatch output plugin -- unfortunately, Fluent Bit appears to fail at the first step (retrieving the required IAM credentials and assuming the required IAM role) because it cannot access sts.amazonaws.com without a proxy:

time="2020-11-25T21:51:32Z" level=error msg="WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.amazonaws.com/: dial tcp XXX:443: i/o timeout"

So, is there a way to specify a proxy in the CloudWatch plugin config, that will only apply to that plugin?

ps: I know that, in v1.6.4 of Fluent Bit, support for a global HTTP_PROXY env var was introduced -- unfortunately, as I explain here, due to its lack of support for an accompanying NO_PROXY env var, that doesn't solve my problem either. So, ideally, I need some proxy config that will only apply to this CloudWatch output plugin.

Config
Here's my Fluent Bit config for CloudWatch:

[OUTPUT]
    Name              cloudwatch
    Match             *
    region            us-east-1
    log_group_name    test-group
    log_stream_name   $(tag)
    auto_create_group true

Environment

  • AWS Fluent Bit Docker Image Version: 2.9.0
  • Fluent Bit: 1.6.3
  • Kubernetes (EKS)

Issue using newrelic plugin when supplying custom fluentbit config file

When supplying a custom fluentbit config file and using the newrelic plugin as so:

[OUTPUT]
    Name newrelic
    Match *
    licenseKey 

My log router container gives:
plugin 'newrelic' cannot be loaded | Output plugin 'newrelic' cannot be loaded
to cloudwatch. For sending logs to datadog i do the same thing and haven't had any issues.

I noticed that the docker image has both the datadog and newrelic plugins compiled into the fluentbit binary, has anyone gotten newrelic to work this way? We need to supply a custom configuration as we are outputting to multiple destinations.

How to sending logs to different destinations

Hi,

I wanted to ask if it is possible to split incoming logs to log router and sent them to a different location using firelense.

For eg. app logs and ystem logs

  • System logs should go to CloudWatch log group.
  • App logs should go to Fluentd > ES > Kibana

Currently, all the logs go to a single location at a time using firelens.

Any help here would mean a lot. Thanks

How do I replace the "log" key with something else?

I'm using a fluent-bit sidecar in ECS to ship logs from my app to datadog. I want to use the msg key that datadog expected and I've been seeing the log key coming from fluent. Is there a way to change this ?

From fluent/fluent-bit#1331, there is a reference to the fluent-bit tail docs, where I see

Key Description Default
Key When a message is unstructured (no parser applied), it's appended as a string under the key name log. This option allows to define an alternative name for that key. log
$ aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 906394416424.dkr.ecr.us-west-2.amazonaws.com
$ docker pull 906394416424.dkr.ecr.us-west-2.amazonaws.com/aws-for-fluent-bit:latest
$ docker run -it 906394416424.dkr.ecr.us-west-2.amazonaws.com/aws-for-fluent-bit:latest bash
bash-4.2# echo  "{\"key\": \"$(date)\"}" >> /tmp/example
bash-4.2# cd fluent-bit/bin/
bash-4.2# ./fluent-bit \
  -i tail \
  -p path=/tmp/example \
  -t testcase \
  -o stdout
...
[0] testcase: [1592505323.913605400, {"log"=>"{"key": "Thu Jun 18 18:33:18 UTC 2020"}"}]
bash-4.2# ./fluent-bit \
  -i tail \
  -p path=/tmp/example \
  -p key=asshole \
  -t testcase \
  -o stdout
...
[0] testcase: [1592505323.913605400, {"asshole"=>"{"key": "Thu Jun 18 18:33:18 UTC 2020"}"}]

If I change the configuration file /fluent-bit/etc/fluent-bit.conf to

[INPUT]
    Name        forward
    Listen      0.0.0.0
    Port        24224

[INPUT]
    Name        tail
    Path        /tmp/example
    Key         msg

[OUTPUT]
    Name stdout
    Match   *

and run

bash-4.2# ./fluent-bit \
  -t testcase \
  -o stdout \
  -c /fluent-bit/etc/fluent-bit.conf
...
[0] tail.1: [1592505889.030174300, {"msg"=>"{"key": "Thu Jun 18 18:33:18 UTC 2020"}"}]

Is there an environment variable or a way to configure this? or would I have to overwrite the configuration file? If I do have to overwrite the config, is there a way I can pass it as an env var or do I need to put it in S3 and provide the S3 path?

What would be the best way forward? Thanks.

Make it work on EKS Fargate

We have been able to successfully deploy the fluent bit agent as a sidecar container using the configuration here: aws/containers-roadmap#701 (comment)

But one of the main limitations with Fargate is storage. Is there any way to handle purging with this fluent bit agent? Otherwise our log files will easily reach the 20GB limit.

Fluent-bit sidecar is killed because of networking error

Hey aws team, I want to flag you an issue we are currently having in our production system. We are using the following software versions:

  • aws-for-fluent-bit: 2.6.1
  • docker: 18.09.9-ce
  • ecs agent: 1.36.2

Our tasks are running on EC2 mode. I didn't check the behavior with fargate.

Fluent-bit sidecar container was killed with exit code 139, and as it is an essential container, our task suddenly stopped.

Fluent-bit logs during the crash

[2020/08/17 07:32:29] [ info] [output:datadog:datadog.1] https://http-intake.logs.datadoghq.com, port=443, HTTP status=200 payload={}
[engine] caught signal (SIGSEGV)
[2020/08/17 07:33:00] [error] [tls] SSL error: NET - Connection was reset by peer
[2020/08/17 07:33:00] [error] [src/flb_http_client.c:1077 errno=25] Inappropriate ioctl for device
[2020/08/17 07:33:00] [error] [output:datadog:datadog.1] could not flush records to http-intake.logs.datadoghq.com:443 (http_do=-1)

Docker daemon logs during the crash

Aug 17 07:33:00 <ip> dockerd[3364]: time="2020-08-17T07:33:00.678373199Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Aug 17 07:33:00 <ip> dockerd[3364]: time="2020-08-17T07:33:00.910700627Z" level=error msg="Failed to log msg \"\" for logger fluentd: write unix @->/var/run/fluent.sock: write: broken pipe"
Aug 17 07:33:00 <ip> dockerd[3364]: time="2020-08-17T07:33:00.926290270Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:00 <ip> dockerd[3364]: time="2020-08-17T07:33:00.931976771Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:00 <ip> dockerd[3364]: time="2020-08-17T07:33:00.932007976Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:00 <ip> dockerd[3364]: time="2020-08-17T07:33:00.932909970Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:00 <ip> dockerd[3364]: time="2020-08-17T07:33:00.933183296Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:00 <ip> dockerd[3364]: time="2020-08-17T07:33:00.933869906Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:00 <ip> dockerd[3364]: time="2020-08-17T07:33:00.979820912Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:00 <ip> dockerd[3364]: time="2020-08-17T07:33:00.979910901Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:00 <ip> dockerd[3364]: time="2020-08-17T07:33:00.980036496Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:00 <ip> dockerd[3364]: time="2020-08-17T07:33:00.983592287Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:00 <ip> dockerd[3364]: time="2020-08-17T07:33:00.992278230Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:01 <ip> dockerd[3364]: time="2020-08-17T07:33:01.112475840Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:01 <ip> dockerd[3364]: time="2020-08-17T07:33:01.112527938Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:01 <ip> dockerd[3364]: time="2020-08-17T07:33:01.115452827Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:01 <ip> dockerd[3364]: time="2020-08-17T07:33:01.132519235Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:01 <ip> dockerd[3364]: time="2020-08-17T07:33:01.132559388Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:01 <ip> dockerd[3364]: time="2020-08-17T07:33:01.136487318Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:01 <ip> dockerd[3364]: time="2020-08-17T07:33:01.140677989Z" level=error msg="Failed to log msg \"\" for logger fluentd: fluent#send: can't send logs, client is reconnecting"
Aug 17 07:33:01 <ip> dockerd[3364]: time="2020-08-17T07:33:01.295414547Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

Fluent bit Metrics

Memory and CPU for fluent-bit logging sidecar is stable over the time

image

And there are not detected errors in our prometheus metrics for fluent bit container around 09:33 (CEST time, 07:33 UTC) but it might be because it crashed and metrics were not scrapped. Anyways, I'm sharing the screenshot to show that there are a bunch of errors with datadog in the last 3 hours (queries interval is 10 minutes)
image

I'm not sure if it's related to #63 - error messages are different but the behavior seems to be similar

Some logs are not created in Cloudwatch with version 2.8.0

Hi,
I have been using aws-for-fluent-bit:latest in production since few months with no issue, with redirection of logs to cloudwatch.
Since few days, it seems like some logs are not written anymore in Cloudwatch.
I went back to aws-for-fluent-bit:2.7.0 and it works perfectly.
I tried with aws-for-fluent-bit:2.8.0 and I am experiencing the issue

Here is the log of 2.7.0 version:

| 2020-10-14T17:18:25.438+04:00 | AWS for Fluent Bit Container Image Version 2.7.0
  | 2020-10-14T17:18:25.483+04:00 | �[1mFluent Bit v1.5.6�[0m
  | 2020-10-14T17:18:25.483+04:00 | * �[1m�[93mCopyright (C) 2019-2020 The Fluent Bit Authors�[0m
  | 2020-10-14T17:18:25.483+04:00 | * �[1m�[93mCopyright (C) 2015-2018 Treasure Data�[0m
  | 2020-10-14T17:18:25.483+04:00 | * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
  | 2020-10-14T17:18:25.483+04:00 | * https://fluentbit.io
  | 2020-10-14T17:18:25.484+04:00 | time="2020-10-14T13:18:25Z" level=info msg="[cloudwatch 0] plugin parameter log_group = <log_group>"
  | 2020-10-14T17:18:25.484+04:00 | time="2020-10-14T13:18:25Z" level=info msg="[cloudwatch 0] plugin parameter log_stream_prefix = 'echo hello/'"
  | 2020-10-14T17:18:25.484+04:00 | time="2020-10-14T13:18:25Z" level=info msg="[cloudwatch 0] plugin parameter log_stream_name = ''"
  | 2020-10-14T17:18:25.485+04:00 | time="2020-10-14T13:18:25Z" level=info msg="[cloudwatch 0] plugin parameter region = 'eu-central-1'"
  | 2020-10-14T17:18:25.485+04:00 | time="2020-10-14T13:18:25Z" level=info msg="[cloudwatch 0] plugin parameter log_key = 'log'"
  | 2020-10-14T17:18:25.485+04:00 | time="2020-10-14T13:18:25Z" level=info msg="[cloudwatch 0] plugin parameter role_arn = ''"
  | 2020-10-14T17:18:25.485+04:00 | time="2020-10-14T13:18:25Z" level=info msg="[cloudwatch 0] plugin parameter new_log_group_tags = ''"
  | 2020-10-14T17:18:25.485+04:00 | time="2020-10-14T13:18:25Z" level=info msg="[cloudwatch 0] plugin parameter log_retention_days = '0'"
  | 2020-10-14T17:18:25.485+04:00 | time="2020-10-14T13:18:25Z" level=info msg="[cloudwatch 0] plugin parameter endpoint = ''"
  | 2020-10-14T17:18:25.485+04:00 | time="2020-10-14T13:18:25Z" level=info msg="[cloudwatch 0] plugin parameter sts_endpoint = ''"
  | 2020-10-14T17:18:25.485+04:00 | time="2020-10-14T13:18:25Z" level=info msg="[cloudwatch 0] plugin parameter credentials_endpoint = "
  | 2020-10-14T17:18:25.485+04:00 | time="2020-10-14T13:18:25Z" level=info msg="[cloudwatch 0] plugin parameter log_format = ''"
  | 2020-10-14T17:18:25.485+04:00 | [2020/10/14 13:18:25] [ info] [engine] started (pid=1)
  | 2020-10-14T17:18:25.486+04:00 | [2020/10/14 13:18:25] [ info] [storage] version=1.0.5, initializing...
  | 2020-10-14T17:18:25.486+04:00 | [2020/10/14 13:18:25] [ info] [storage] in-memory
  | 2020-10-14T17:18:25.486+04:00 | [2020/10/14 13:18:25] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
  | 2020-10-14T17:18:25.486+04:00 | [2020/10/14 13:18:25] [ info] [input:forward:forward.0] listening on unix:///var/run/fluent.sock
  | 2020-10-14T17:18:25.486+04:00 | [2020/10/14 13:18:25] [ info] [input:forward:forward.1] listening on 0.0.0.0:24224
  | 2020-10-14T17:18:25.486+04:00 | [2020/10/14 13:18:25] [ info] [input:tcp:tcp.2] listening on 127.0.0.1:8877
  | 2020-10-14T17:18:25.486+04:00 | [2020/10/14 13:18:25] [ info] [sp] stream processor started
  | 2020-10-14T17:18:26.239+04:00 | [engine] caught signal (SIGTERM)
  | 2020-10-14T17:18:26.301+04:00 | time="2020-10-14T13:18:26Z" level=info msg="[cloudwatch 0] Log group <log_group> exists\n"
  | 2020-10-14T17:18:26.338+04:00 | time="2020-10-14T13:18:26Z" level=info msg="[cloudwatch 0] Created log stream echo hello/utils-firelens- in group <log_group>"
  | 2020-10-14T17:18:26.354+04:00 | [2020/10/14 13:18:26] [ warn] [engine] service will stop in 5 seconds
  | 2020-10-14T17:18:30.913+04:00 | [2020/10/14 13:18:30] [ info] [engine] service stopped

Here is the log of 2.8.0 version:

2020-10-14T17:20:51.085+04:00 | AWS for Fluent Bit Container Image Version 2.8.0
  | 2020-10-14T17:20:51.133+04:00 | �[1mFluent Bit v1.6.0�[0m
  | 2020-10-14T17:20:51.133+04:00 | * �[1m�[93mCopyright (C) 2019-2020 The Fluent Bit Authors�[0m
  | 2020-10-14T17:20:51.133+04:00 | * �[1m�[93mCopyright (C) 2015-2018 Treasure Data�[0m
  | 2020-10-14T17:20:51.133+04:00 | * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
  | 2020-10-14T17:20:51.133+04:00 | * https://fluentbit.io
  | 2020-10-14T17:20:51.135+04:00 | [2020/10/14 13:20:51] [ info] [engine] started (pid=1)
  | 2020-10-14T17:20:51.135+04:00 | [2020/10/14 13:20:51] [ info] [storage] version=1.0.6, initializing...
  | 2020-10-14T17:20:51.135+04:00 | [2020/10/14 13:20:51] [ info] [storage] in-memory
  | 2020-10-14T17:20:51.135+04:00 | [2020/10/14 13:20:51] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
  | 2020-10-14T17:20:51.135+04:00 | [2020/10/14 13:20:51] [ info] [input:forward:forward.0] listening on unix:///var/run/fluent.sock
  | 2020-10-14T17:20:51.135+04:00 | [2020/10/14 13:20:51] [ info] [input:forward:forward.1] listening on 0.0.0.0:24224
  | 2020-10-14T17:20:51.135+04:00 | [2020/10/14 13:20:51] [ info] [input:tcp:tcp.2] listening on 127.0.0.1:8877
  | 2020-10-14T17:20:51.135+04:00 | [2020/10/14 13:20:51] [ info] [sp] stream processor started
  | 2020-10-14T17:20:51.135+04:00 | time="2020-10-14T13:20:51Z" level=info msg="[cloudwatch 0] plugin parameter log_group = <log_group>"
  | 2020-10-14T17:20:51.135+04:00 | time="2020-10-14T13:20:51Z" level=info msg="[cloudwatch 0] plugin parameter log_stream_prefix = 'echo hello/'"
  | 2020-10-14T17:20:51.135+04:00 | time="2020-10-14T13:20:51Z" level=info msg="[cloudwatch 0] plugin parameter log_stream_name = ''"
  | 2020-10-14T17:20:51.135+04:00 | time="2020-10-14T13:20:51Z" level=info msg="[cloudwatch 0] plugin parameter region = 'eu-central-1'"
  | 2020-10-14T17:20:51.135+04:00 | time="2020-10-14T13:20:51Z" level=info msg="[cloudwatch 0] plugin parameter log_key = 'log'"
  | 2020-10-14T17:20:51.135+04:00 | time="2020-10-14T13:20:51Z" level=info msg="[cloudwatch 0] plugin parameter role_arn = ''"
  | 2020-10-14T17:20:51.135+04:00 | time="2020-10-14T13:20:51Z" level=info msg="[cloudwatch 0] plugin parameter auto_create_group = 'false'"
  | 2020-10-14T17:20:51.135+04:00 | time="2020-10-14T13:20:51Z" level=info msg="[cloudwatch 0] plugin parameter new_log_group_tags = ''"
  | 2020-10-14T17:20:51.135+04:00 | time="2020-10-14T13:20:51Z" level=info msg="[cloudwatch 0] plugin parameter log_retention_days = '0'"
  | 2020-10-14T17:20:51.135+04:00 | time="2020-10-14T13:20:51Z" level=info msg="[cloudwatch 0] plugin parameter endpoint = ''"
  | 2020-10-14T17:20:51.135+04:00 | time="2020-10-14T13:20:51Z" level=info msg="[cloudwatch 0] plugin parameter sts_endpoint = ''"
  | 2020-10-14T17:20:51.135+04:00 | time="2020-10-14T13:20:51Z" level=info msg="[cloudwatch 0] plugin parameter credentials_endpoint = "
  | 2020-10-14T17:20:51.135+04:00 | time="2020-10-14T13:20:51Z" level=info msg="[cloudwatch 0] plugin parameter log_format = ''"
  | 2020-10-14T17:20:52.157+04:00 | [2020/10/14 13:20:52] [engine] caught signal (SIGTERM)
  | 2020-10-14T17:20:52.157+04:00 | [2020/10/14 13:20:52] [ info] [input] pausing forward.0
  | 2020-10-14T17:20:52.157+04:00 | [2020/10/14 13:20:52] [ info] [input] pausing forward.1

tput failure on startup

What

Every time a new container starts in our cluster we see the following logs.

tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
AWS for Fluent Bit Container Image Version 2.2.0
tput: No value for $TERM and no -T specified

Not a huge deal it's just a minor annoyance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.