graylog2 / go-gelf Goto Github PK

View Code? Open in Web Editor NEW

This project forked from deck36/go-gelf

102.0 102.0 64.0 83 KB

GELF library for Go

Home Page: http://godoc.org/github.com/Graylog2/go-gelf/gelf

License: MIT License

Go 100.00%

go-gelf's People

Contributors

Stargazers

Watchers

go-gelf's Issues

Missing the milliseconds in Graylog

Line 146 in message.go (v2.0 branch) is currently:

TimeUnix: float64(time.Now().Unix()),

Unix time is in seconds since 1.1.1970, thus, log entries in Graylog get the milliseconds truncated.

TimeUnix: float64(time.Now().UnixNano()) / float64(time.Second.Nanoseconds()),

works nicely.

Best regards.

No exported method for creating TCPReaders

There is a newTCPReader function, however it isn't exported. There doesn't seem to be an exported method of creating TCP-based reader applications that use this library.

https://github.com/Graylog2/go-gelf/blob/v2/gelf/tcpreader.go#L22

Not goroutine safe for TCP writer

// issue.go
package main

import (
    "time"
    "gopkg.in/Graylog2/go-gelf.v2/gelf"
)

func main() {
    addr := ""
	writer, _ := gelf.NewTCPWriter(addr)
	parall := 10000
	goroutineSize := 100
	for q := 0; q < goroutineSize; q++ {
		go func() {
			for j := 0; j < parall/goroutineSize; j++ {
				writer.WriteMessage(&gelf.Message{
					Version:  "1.0",
					Host:     "hostname",
					Short:    "-",
					TimeUnix: float64(time.Now().UnixNano()),
					Level:    6,
					Facility: "",
				})
			}
		}()
	}
	select {}
}

run go run -race issue.go; will get DATA RACE WARNING.
it is a bugfix: https://github.com/lpflpf/go-gelf/tree/v2

when working with zerolog - getting a lot of "short write" errors

I'm using it go-gelf with zerolog and for aly log message i'm getting short write
with some debugging i've figured out that the issue is in func (w *Writer) Write(p []byte) in gelf/writer.go

May  3 15:23:15 localhost twas[4839]: writemessage succeeded 50requested:51written:50zerolog: could not write event: short write
May  3 15:23:17 localhost twas[4839]: 3:23PM DBG > 34338 _typ1=prom _typ2="processing time"
May  3 15:23:17 localhost twas[4839]: writemessage succeeded 96requested:97written:96zerolog: could not write event: short write
May  3 15:23:17 localhost twas[4839]: 3:23PM DBG debug log
May  3 15:23:17 localhost twas[4839]: writemessage succeeded 57requested:58written:57zerolog: could not write event: short write
May  3 15:23:17 localhost twas[4839]: 3:23PM DBG 12
May  3 15:23:17 localhost twas[4839]: writemessage succeeded 50requested:51written:50zerolog: could not write event: short write
May  3 15:23:19 localhost twas[4839]: 3:23PM DBG > 33720 _typ1=prom _typ2="processing time"
May  3 15:23:19 localhost twas[4839]: writemessage succeeded 96requested:97written:96zerolog: could not write event: short write
May  3 15:23:19 localhost twas[4839]: 3:23PM DBG debug log

When zerolog is requesting N bytes to write -> gelf writer is always reporting N-1 written.
If i remove

p = bytes.TrimSpace(p)

then everything works ok.

Please consider assigning version numbers and tagging releases. Tags/releases
are useful to downstream package maintainers (in Debian and other distributions) to export source tarballs, automatically track new releases and to declare dependencies between packages. Read more in the Debian Upstream Guide.

Versioning provides additional benefits to encourage vendoring of a particular (e.g. latest stable) release contrary to random unreleased snapshots.

Thank you.

UDP Reader: dropped & corrupted messages and panics under load

Under high load, the UDP Reader will drop messages, corrupt a sizeable part of those that arrive, and even panic.

The issues are caused by GELF chunks arriving out of order or mixing different messages. The Reader drops some messages that arrive out of order. Others will be merged with chunks from different sources, causing corruption that can either crash the JSON parser (due to malformed JSON) and thus a dropped message, or return valid JSON but with corrupt contents to the user. Sometimes it will even result in a panic, crashing the application.

I wrote a test case to illustrate the issues here:
https://github.com/xose/go-gelf/blob/stress/gelf/reader_test.go

The test case starts an UDP Reader and creates several goroutines to simulate multiple sources sending a large amount of GELF messages to the server. It then reads those messages and verifies their integrity.

Each "log" contains a long string (of random length) of the same character, different for each source, and extra fields with the character and expected message length. These values are used to check GELF message integrity.

The GELF messages are constructed by the test case instead of using the library itself to rule out an issue on other parts of the library. For GELF messages longer than a threshold the message is chunked. No compression is used. Test values can be configured changing the constants at the top of the file.

All errors will be logged on the console, and a summary will be printed at the end, like this:

5000 sent (1627 plain, 3373 chunked), 0 refused, 4584 received, (1473 OK, 3111 errors, 167 panics)

WriteMessage Example

I recently started using v2 of this package and noticed the To Do section in the README.

I created an example here of using the WriteMessage function. I've verified this code works.

I'm happy to create a PR for this if you folks think this example works well enough.

No mutex on Writer

I'm tracking a bug where Graylog says it has received invalid GELF JSON (TCP transport, from docker GELF logger which is using this Writer).

It looks like this mu Mutex is never used...

go-gelf/gelf/writer.go

Line 29 in 7ebf4f5

mu sync.Mutex

Here is the calling code in the Docker logger..

https://github.com/moby/moby/blob/master/daemon/logger/gelf/gelf.go#L184

And here's an extract from my Graylog logs...

2018-03-17 09:27:23,810 ERROR: org.graylog2.inputs.codecs.GelfCodec - Could not parse JSON, first 400 characters: {"version":"1.1","host":"france2.sslping.com","short_message":"2018/03/17 09:27:22 twr1.thewaiting-room.net (80.82.112.63:443): true","timestamp":1521278842.705,"level":3,"_command":"./main","_container_id":"95abe5c05bde85e05a460b4d2a0875d779d23e9469722846430e96bbf8e5afec","_container_name":"nats_cve20162107.3.43d8bhzvc82qkc6ir2lrziyyi","_created":"2018-03-16T17:43:35.735128942Z","_image_id":"sha2...

com.fasterxml.jackson.core.JsonParseException: Unexpected character ('v' (code 118)): was expecting comma to separate Object entries
 at [Source: {"version":"1.1","host":"france2.sslping.com","short_message":"2018/03/17 09:27:22 twr1.thewaiting-room.net (80.82.112.63:443): true","timestamp":1521278842.705,"level":3,"_command":"./main","_container_id":"95abe5c05bde85e05a460b4d2a0875d779d23e9469722846430e96bbf8e5afec","_container_name":"nats_cve20162107.3.43d8bhzvc82qkc6ir2lrziyyi","_created":"2018-03-16T17:43:35.735128942Z","_image_id":"sha256:a70d979f79007e38f57a0b77310145102934eee96c80a4df3f03b642e76aad6e","_image_name":"registry.gitlab.com/chrishartwig/sslping/nats_cve20162107:latest","_tag":"vul{"version":"1.1","host":"france2.sslping.com","short_message":"2018/03/17 09:27:22 pip.lade.io (13.32.214.118:443): false","timestamp":1521278842.831,"level":3,"_command":"./main","_container_id":"8e68654e1ff2ec9838a10362141ac432ca0301dfa246ae52cfa8c81cbe59f9d7","_container_name":"nats_heartbleed.2.fx9blxsrii95z3w7a59t539s1","_created":"2018-03-16T17:42:22.48917432Z","_image_id":"sha256:6d1aed1d88d0b2b38f6f6435396670fa27f060656701efe86e731d3163759624","_image_name":"registry.gitlab.com/chrishartwig/sslping/nats_heartbleed:latest","_tag":"vuln"}; line: 1, column: 565]

It does occur under moderate load (13k messages in 4min) and from the logs, it looks like the 2 mingled messages are always from the same host, possibly from the same container (hard to tell with corrupted JSON)...
I don't know if it's a graylog issue, or a Docker logger issue (maybe the logger should use a mutex)...

Define behavior with multiple DNS records for a GELF server name

What should happen if the DNS lookup for the GELF server returns multiple records?

A good default might be to round-robin between the servers to achieve load balancing. This is problematic with GELF chunking, though. All chunks for one message need to be delivered to the same server.

So if we round-robin, we have to ensure that we send all chunks (if there is more than one) to the same server.

Add `go-gelf` Room in Gitter

There's currently a graylog2-server room in Gitter, but it would be nice to have one for go-gelf as well. Could some member of the Graylog2 project create this?

JSON message transports as GELF

JSON loggers will put their fields to "extra" thats makes log structured and clear as it conceived in Graylog

PR -- thats uses existing function UnmarshalJSON and works for me

Mb, there are some perfomance issues from additional parse

WriteMessage example

This library WriteMessage method is broken. Whenever I call it with any log level and a custom facility, messages are not delivered.

My code

graylogAddr := os.Getenv("GRAYLOG_ADDRESS")

gelfWriter, err := gelf.NewTCPWriter(graylogAddr)
if err != nil {
	log.Fatalf("gelf.NewTCPWriter: %s", err)
}

gelfWriter.Facility = "test"

log.SetOutput(io.MultiWriter(os.Stderr, gelfWriter))
log.Printf("This message is delivered as expected to graylog2@'%s'", graylogAddr) // works

    // following code loses message, nothing gets delivered to Graylog, with no error.

m := &gelf.Message{
	Version:  "1.1",
	Short:   "test message",
	Full:    "error details",
	TimeUnix: float64(time.Now().Unix()),
	Level:    gelf.LOG_DEBUG,
	Extra:    map[string]interface{}{"test_arg": 4},
}

err := self.writer.WriteMessage(m)
if err == nil {
	return true // works
}

Graylog version: 5.2 clean host, added System > Source, Debug level is set in System > Logging

Re-resolve address of GELF server after TTL expires

We are currently only resolving the address of the GELF server when creating a new writer. This is problematic for long running processes because the IP address of the GELF server might change. Since this is UDP, the "connection" will never break and we will never notice.

To solve this, the IP address of the GELF server should be resolved again once the TTL of the DNS entry expires.

Change `Write([]byte)` to log full message to `message` field

[This is in part a continuation of a discussion started in PR #9 where we were also worried about message length.]

The message field (sent as short_message in GELF) and the full_message field are stored in the exact same way in Elasticsearch, so presumably they support the same sizes. I've actually tested message only up to 80 KB (via a null-delimited "RAW/Plaintext" interface) and 2500 lines or so, however. :-) It works fine, and the Graylog UI seems to do a reasonable job of dealing with it. (I have seen a complaint that long fields don't work, but that appears to be because they changed the default configuration to do term analysis on the message field, which isn't done by default.)

Having looked into this a bit, I find:

full_message is optional to send in GELF, unlike message.
full_message is not parsed for things like sources and timestamps.
On its syslog inputs, Graylog has a store_full_message: true option which stores the raw syslog protocol message (usually RFC 5424 in the full_message field. This seems useful for debugging and handling situations where bad messages come in that the parser can't handle. We turned this on when we set up Graylog because it seemed like a good idea at the time, and it still doesn't strike me as a bad thing to do.
Graylog search functionality seems to be oriented mainly towards message and it seems to take a bit of extra work to search full_message instead.

From all this, I get the sense that full_message is considered something that can be safely ignored by clients retrieving and processing messages from Graylog. Given that, I think it's a bad idea, when given a message via the io.Writer interface, not to put the entire message into the message field.

As background, let me tell you about my use of Graylog at work. We have dozens of servers generating log messages into the Systemd Journal that we forward to a central Graylog installation so we can easily query across servers, do centralized analysis (such as with OSSEC LIDS log analysis), keep historical records, and so on.

In order to keep things as simple as possible, I'm aiming to get all our non-journal logging also going into the journal, including logs from programs that write directly to /var/log such as auditd, output of jobs run by cron, and so on. (For example, I'm intending to replace /usr/sbin/sendmail on those servers with a program that eats the message and logs it to the journal.)

One of the questions I had to ask myself is whether I should be using the full_message field, and what with various programs partially duplicating information across message and full_message, that seems like a bad idea. How would the OSSEC log scanner, for example, know that for some log entries full_message is all the information in message and a few other fields in unparsed form, yet for other messages message could be an incomplete log message and full_message contains the complete message but without other log entry information such as the timestamp? That way lies a lot of pain, especially if I end up having to assemble new versions of log entries for log-scanning software that expects things to be re-assembled in a syslog-ish format with the timestamp, source etc. all in one big string.

So for these reasons, I'm suggesting we change the io.Writer API, Write([]byte), of go-gelf to record the entire bytestring it's handed in the message field and not send a full_message field. I think it's reasonable for it to continue adding things like the extra fields for file and line number of the caller.

Note that if anybody does want to use the full_message field for something, or any other fields for that matter, we have the Write(*Message) interface for that which allows (or should allow) full control over the GELF message being sent, within the limitations of the protocol. (We should not allow, e.g., a GELF message to be sent without a message field since that field is required by the protocol.)

If we're agreed on this, I'm happy to submit a PR for changes to the code and update the documentation appropriately, including this rationale in an appropriate place (probably the commit message).

V2 TLS Support

Hello,

It would be very nice to have support for TLS TCP inputs. I'm not sure what the status of the tls branch is.

graylog2 / go-gelf Goto Github PK

go-gelf's People

Contributors

Stargazers

Watchers

Forkers

go-gelf's Issues

Recommend Projects

Recommend Topics

Recommend Org