Giter VIP home page Giter VIP logo

logzio_aws_serverless's Introduction

AWS Serverless Shipper - Lambda

This is an AWS Lambda function that ships logs from AWS services to Logz.io.

Note: This project contains code for Python 2 and Python 3. We urge you to use Python 3 because Python 2.7 will reach end of life on January 1, 2020.

Get started with Python 3

logzio_aws_serverless's People

Contributors

8naama avatar alexbescond avatar asafm avatar danielberman avatar doron-bargo avatar idohalevi avatar imnotashrimp avatar manoli-yiannakakis avatar mend-bolt-for-github[bot] avatar mirii1994 avatar nathanhruby avatar nico-shishkin avatar resdenia avatar ronish31 avatar sam-io avatar shay108 avatar talhibner avatar tamir-michaeli avatar yyyogev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

logzio_aws_serverless's Issues

GzipLogRequest fails with 413 (Too large upload size)

The GzipLogRequest does not increment the self._decompress_size variable and if there are a large number of records in the batch it can cause the upload to fail with a 413 https://docs.logz.io/shipping/log-sources/json-uploads.html#request-entity-too-large.

You can fix this by updating the write method (https://github.com/logzio/logzio_aws_serverless/blob/master/python3/shipper/shipper.py#L49) of the GzipLogRequest class.

    def write(self, log):
        bytes_to_write = bytes("\n" + log, 'utf-8') if self._logs_counter else bytes(log, 'utf-8')
        self._writer.write(bytes_to_write)
        self._decompress_size += sys.getsizeof(bytes_to_write)
        self._logs_counter += 1

JS logs containing [ and ] are failing to parse

When logging from JS code and the message contains [ and ] (such as when logging JSON that contains arrays) the log message is not being parsed nor its JSON content.

Repro:
Create a JS lambda with the following code:

module.exports.handler = async (event, context) => {
  console.log('Just a message')
  console.log(JSON.stringify({
    message: 'just a JSON message'
  }))
  console.log(JSON.stringify({
    message: 'A message and array',
    array: ['with', 'data'],
  }))
  console.log(`Message with [brackets]`)

  return {
    statusCode: 200
  }
}

And the messages that contain brackets aren't being parsed:
image

Compare
image
with
image

Cloudwatch log shipper doesn't work with lambda insights

We use the cloudwatch shipper.
We enabled lambda insights.
We started getting a lot of logzio-index-failure in our logs.
The index-failed-reason is

{"type":"mapper_parsing_exception","reason":"failed to parse field [@timestamp] of type [date] in document with id 'REDACTED'. Preview of field's value: 'EXTENSION'","caused_by":{"type":"illegal_argument_exception","reason":"failed to parse date field [EXTENSION] with format [strict_d...

The reason the "@timestamp" field is incorrectly populated seems due to this code

Lambda insights logs seem to use a different format from normal logs

EXTENSION	Name: cloudwatch_lambda_agent	State: Ready	Events: [INVOKE,SHUTDOWN]

These logs don't seem very valuable, so I'd imagine they should never be forwarded to logzio in the first place.

Deploy via Serverless Application Repo ?

Would it be possible to push an official version to SAR?

This makes it super-easy to integrate into Cloudformation, because everyone can then just reference it directly in an existing template, which saves doing a deploy across all your accounts and regions.

If not, I'll probably just do a local build and push to private SAR.

[FEATURE] Python 2.7 Migration

Hi,

With Python 2.7 going EOL next 1st Jan 2020, is there any plans to migrate to Python 3.X or Go?

Kind regards,

Dan

Releases and tags

Hey.
Could you guys maybe create tags for released version?
Also, ideally you could have release artifacts that are would be more 'turn key', rather than us having to copy files around before zipping.
It makes it pretty cumbersome to consume these as IaC.

Any improvement here would be great. For us, a serverless framework plugin would be ideal.

Thanks

Cloudwatch logs parsing issue

I believe there might be an error here when you're parsing the AWS Lambda logs

You have this on line number 58 in lambda_function code

if len(message_parts) == 3:
            log['@timestamp'] = message_parts[0]
            log['requestID'] = message_parts[1]
            log['message'] = message_parts[2]

which I think should be

if len(message_parts) == 4:
            log['@timestamp'] = message_parts[0]
            log['requestID'] = message_parts[1]
            log['logLevel'] = message_parts[2]
            log['message'] = message_parts[3]

I also think that you were trying to ignore the lines in AWS Lambda logs that start with START, END, & REPORT but they're still getting through.

Ability to enrich the CloudWatch logs with custom properties

A useful feature for the lambda is the ability to enrich the logs with custom metrics.
A use case is the usage of multiple AWS accounts for the different environments (dev, test, UAT, production), but the AWS services at the time of logging are not aware of it.

When we ship the logs to Logz.io, it will be handy to add custom properties, such as environment: testing, which will make the querying process easier.

As an implementation, the lambda could leverage the variables, which are transformed into environment variables at runtime. It could be implemented as a single variable: properties_to_enrich: environment=testing; foo=bar; which will produce properties called environment with value testing and foo with value bar.

Cloudwatch log-shipper process exits with Exception on list JSON message

Problem

When cloudwatch's log message contains JSON list type message, it will throw (raise) the following exception:

[ERROR] AttributeError: 'list' object has no attribute 'items' Traceback (most recent call last):   File "/var/task/lambda_function.py", line 142, in lambda_handler     if _parse_cloudwatch_log(log, additional_data):   File "/var/task/lambda_function.py", line 92, in _parse_cloudwatch_log     _parse_to_json(log)   File "/var/task/lambda_function.py", line 77, in _parse_to_json     for key, value in json_object.items():

https://github.com/logzio/logzio_aws_serverless/blob/master/python3/cloudwatch/src/lambda_function.py#L72-L80
This exception is not caught since the exception handler is only looking for specific Error types.

Analysis

This can only happen for any valid JSON payload that is not an object. I think that's only List ([...]).

Since Error is not caught, the process will exit thus all aws_logs_data that was consumed is likely not processed / shipped. This causes an even bigger blast radius if the user of the log-shipper is shipping logs from multiple logGroups.

Opinion

IMHO, the try/except block should be handled at _parse_cloud_watch_log(), which seems to be the method that handles each log.
https://github.com/logzio/logzio_aws_serverless/blob/master/python3/cloudwatch/src/lambda_function.py#L144-L149
https://github.com/logzio/logzio_aws_serverless/blob/master/python3/cloudwatch/src/lambda_function.py#L83-L93

I also think list type should be parsed correctly, but since I'm aware of how the ingestion side process the data, this might be much harder than I imagine.

Lambda sometimes timeouts after 60 seconds

Hi
We are using the shipper.py code in our lambda with a timeout of 60 seconds to ship logs to lambda.
We see on very very rare occassions that 60 seconds (or more) pass , between the time the request is sent without returning from the request , and then our lambda timed out.
We dont see any retry logs , so it looks like the first call to logzio doesnt return after 60 seconds , in the shipper.py
request = urllib.request.Request(self._logzio_url, data=self._logs.bytes(),
headers=self._logs.http_headers())
return urllib.request.urlopen(request)

The specific amount of data sent in this specific lambda call which timed out was relatively small so it not a size issue.
Questions:

  1. do you expect on rare occassions the logzio server to respond more than 60 seconds ???
  2. Why didnt you add a max timeout to the request call ?

Any help would be great .
Thanks.

shipper.py's GzipLogRequest produces non-POSIX-compliant lines

POSIX defines a Line as:

3.206 Line
A sequence of zero or more non- characters plus a terminating character.

However, in shipper.py, a \n is added only between "lines" and not after every line. For example:

line1\nline2\nline3

This is not POSIX compliant and increases the complexity of the code (requires a counter and an additional if):

def write(self, log):
self._writer.write(bytes("\n" + log, 'utf-8')) if self._logs_counter else self._writer.write(bytes(log, 'utf-8'))
self._logs_counter += 1

Assuming logz processes POSIX-compliant lines, the body of write(self, log) should be changed to something like

self._writer.write(bytes(log+"\n", 'utf-8'))
self._logs_counter += 1

The counter needs to be kept for __len__(self) only.

shipper.py's GzipLogRequest does not close underlying stream

The GzipLogRequest class is using a gzip.GzipFile class backed by an io.BytesIO stream:

self._logs = io.BytesIO()
self._writer = gzip.GzipFile(mode='wb', fileobj=self._logs)

The GzipLogRequest.close() method closes the GzipFile:

def close(self):
self._writer.close()

However, as per class gzip.GzipFile:

Calling a GzipFile object’s close() method does not close fileobj

So I believe the GzipLogRequest.close() should also call self._logs.close().

Error when loading shipper into Lambda

When loading shipper into lambda and following the guide as described, the following error is recieved when tested, and the logs aren't being pulled through as expected:

{
"errorMessage": "Unable to import module 'lambda_function'"
}

START RequestId: e1cd6cb4-d3a9-11e8-b091-fb68b51def34 Version: $LATEST
Unable to import module 'lambda_function': No module named shipper

END RequestId: e1cd6cb4-d3a9-11e8-b091-fb68b51def34
REPORT RequestId: e1cd6cb4-d3a9-11e8-b091-fb68b51def34 Duration: 0.96 ms Billed Duration: 100 ms Memory Size: 512 MB Max Memory Used: 18 MB

'memory_limit_in_mb' error from lambda_function.py

Current lambda results in the following error when configured as instructed. Removing line #39 resolves the error

'memory_limit_in_mb': KeyError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 77, in lambda_handler
_parse_cloudwatch_log(log, aws_logs_data)
File "/var/task/lambda_function.py", line 39, in _parse_cloudwatch_log
log['memory_limit_in_mb'] = aws_logs_data['memory_limit_in_mb']
KeyError: 'memory_limit_in_mb'

json format always handled like text

How do I have to log the messages in a node.js lambda function so the shipper will handle it as json?

currently when I console.log(logObj) this will land in the cloudwatch and logz.io logs:

2018-08-15T07:37:07.085Z 02af8f7c-a05e-11e8-acf8-1d43011671d3 { version: 'dev', .... }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.