graylog2 / go-gelf Goto Github PK
View Code? Open in Web Editor NEWThis project forked from deck36/go-gelf
GELF library for Go
Home Page: http://godoc.org/github.com/Graylog2/go-gelf/gelf
License: MIT License
This project forked from deck36/go-gelf
GELF library for Go
Home Page: http://godoc.org/github.com/Graylog2/go-gelf/gelf
License: MIT License
Line 146 in message.go (v2.0 branch) is currently:
TimeUnix: float64(time.Now().Unix()),
Unix time is in seconds since 1.1.1970, thus, log entries in Graylog get the milliseconds truncated.
TimeUnix: float64(time.Now().UnixNano()) / float64(time.Second.Nanoseconds()),
works nicely.
Best regards.
There is a newTCPReader
function, however it isn't exported. There doesn't seem to be an exported method of creating TCP-based reader applications that use this library.
https://github.com/Graylog2/go-gelf/blob/v2/gelf/tcpreader.go#L22
// issue.go
package main
import (
"time"
"gopkg.in/Graylog2/go-gelf.v2/gelf"
)
func main() {
addr := ""
writer, _ := gelf.NewTCPWriter(addr)
parall := 10000
goroutineSize := 100
for q := 0; q < goroutineSize; q++ {
go func() {
for j := 0; j < parall/goroutineSize; j++ {
writer.WriteMessage(&gelf.Message{
Version: "1.0",
Host: "hostname",
Short: "-",
TimeUnix: float64(time.Now().UnixNano()),
Level: 6,
Facility: "",
})
}
}()
}
select {}
}
run go run -race issue.go
; will get DATA RACE WARNING.
it is a bugfix: https://github.com/lpflpf/go-gelf/tree/v2
I'm using it go-gelf with zerolog and for aly log message i'm getting short write
with some debugging i've figured out that the issue is in func (w *Writer) Write(p []byte)
in gelf/writer.go
May 3 15:23:15 localhost twas[4839]: writemessage succeeded 50requested:51written:50zerolog: could not write event: short write
May 3 15:23:17 localhost twas[4839]: 3:23PM DBG > 34338 _typ1=prom _typ2="processing time"
May 3 15:23:17 localhost twas[4839]: writemessage succeeded 96requested:97written:96zerolog: could not write event: short write
May 3 15:23:17 localhost twas[4839]: 3:23PM DBG debug log
May 3 15:23:17 localhost twas[4839]: writemessage succeeded 57requested:58written:57zerolog: could not write event: short write
May 3 15:23:17 localhost twas[4839]: 3:23PM DBG 12
May 3 15:23:17 localhost twas[4839]: writemessage succeeded 50requested:51written:50zerolog: could not write event: short write
May 3 15:23:19 localhost twas[4839]: 3:23PM DBG > 33720 _typ1=prom _typ2="processing time"
May 3 15:23:19 localhost twas[4839]: writemessage succeeded 96requested:97written:96zerolog: could not write event: short write
May 3 15:23:19 localhost twas[4839]: 3:23PM DBG debug log
When zerolog is requesting N bytes to write -> gelf writer is always reporting N-1 written.
If i remove
p = bytes.TrimSpace(p)
then everything works ok.
Please consider assigning version numbers and tagging releases. Tags/releases
are useful to downstream package maintainers (in Debian and other distributions) to export source tarballs, automatically track new releases and to declare dependencies between packages. Read more in the Debian Upstream Guide.
Versioning provides additional benefits to encourage vendoring of a particular (e.g. latest stable) release contrary to random unreleased snapshots.
Thank you.
See also
Under high load, the UDP Reader will drop messages, corrupt a sizeable part of those that arrive, and even panic.
The issues are caused by GELF chunks arriving out of order or mixing different messages. The Reader drops some messages that arrive out of order. Others will be merged with chunks from different sources, causing corruption that can either crash the JSON parser (due to malformed JSON) and thus a dropped message, or return valid JSON but with corrupt contents to the user. Sometimes it will even result in a panic, crashing the application.
I wrote a test case to illustrate the issues here:
https://github.com/xose/go-gelf/blob/stress/gelf/reader_test.go
The test case starts an UDP Reader and creates several goroutines to simulate multiple sources sending a large amount of GELF messages to the server. It then reads those messages and verifies their integrity.
Each "log" contains a long string (of random length) of the same character, different for each source, and extra fields with the character and expected message length. These values are used to check GELF message integrity.
The GELF messages are constructed by the test case instead of using the library itself to rule out an issue on other parts of the library. For GELF messages longer than a threshold the message is chunked. No compression is used. Test values can be configured changing the constants at the top of the file.
All errors will be logged on the console, and a summary will be printed at the end, like this:
5000 sent (1627 plain, 3373 chunked), 0 refused, 4584 received, (1473 OK, 3111 errors, 167 panics)
I recently started using v2 of this package and noticed the To Do
section in the README.
I created an example here of using the WriteMessage
function. I've verified this code works.
I'm happy to create a PR for this if you folks think this example works well enough.
I'm tracking a bug where Graylog says it has received invalid GELF JSON (TCP transport, from docker GELF logger which is using this Writer).
It looks like this mu
Mutex is never used...
Line 29 in 7ebf4f5
Here is the calling code in the Docker logger..
https://github.com/moby/moby/blob/master/daemon/logger/gelf/gelf.go#L184
And here's an extract from my Graylog logs...
2018-03-17 09:27:23,810 ERROR: org.graylog2.inputs.codecs.GelfCodec - Could not parse JSON, first 400 characters: {"version":"1.1","host":"france2.sslping.com","short_message":"2018/03/17 09:27:22 twr1.thewaiting-room.net (80.82.112.63:443): true","timestamp":1521278842.705,"level":3,"_command":"./main","_container_id":"95abe5c05bde85e05a460b4d2a0875d779d23e9469722846430e96bbf8e5afec","_container_name":"nats_cve20162107.3.43d8bhzvc82qkc6ir2lrziyyi","_created":"2018-03-16T17:43:35.735128942Z","_image_id":"sha2...
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('v' (code 118)): was expecting comma to separate Object entries
at [Source: {"version":"1.1","host":"france2.sslping.com","short_message":"2018/03/17 09:27:22 twr1.thewaiting-room.net (80.82.112.63:443): true","timestamp":1521278842.705,"level":3,"_command":"./main","_container_id":"95abe5c05bde85e05a460b4d2a0875d779d23e9469722846430e96bbf8e5afec","_container_name":"nats_cve20162107.3.43d8bhzvc82qkc6ir2lrziyyi","_created":"2018-03-16T17:43:35.735128942Z","_image_id":"sha256:a70d979f79007e38f57a0b77310145102934eee96c80a4df3f03b642e76aad6e","_image_name":"registry.gitlab.com/chrishartwig/sslping/nats_cve20162107:latest","_tag":"vul{"version":"1.1","host":"france2.sslping.com","short_message":"2018/03/17 09:27:22 pip.lade.io (13.32.214.118:443): false","timestamp":1521278842.831,"level":3,"_command":"./main","_container_id":"8e68654e1ff2ec9838a10362141ac432ca0301dfa246ae52cfa8c81cbe59f9d7","_container_name":"nats_heartbleed.2.fx9blxsrii95z3w7a59t539s1","_created":"2018-03-16T17:42:22.48917432Z","_image_id":"sha256:6d1aed1d88d0b2b38f6f6435396670fa27f060656701efe86e731d3163759624","_image_name":"registry.gitlab.com/chrishartwig/sslping/nats_heartbleed:latest","_tag":"vuln"}; line: 1, column: 565]
It does occur under moderate load (13k messages in 4min) and from the logs, it looks like the 2 mingled messages are always from the same host, possibly from the same container (hard to tell with corrupted JSON)...
I don't know if it's a graylog issue, or a Docker logger issue (maybe the logger should use a mutex)...
What should happen if the DNS lookup for the GELF server returns multiple records?
A good default might be to round-robin between the servers to achieve load balancing. This is problematic with GELF chunking, though. All chunks for one message need to be delivered to the same server.
So if we round-robin, we have to ensure that we send all chunks (if there is more than one) to the same server.
There's currently a graylog2-server room in Gitter, but it would be nice to have one for go-gelf as well. Could some member of the Graylog2 project create this?
JSON loggers will put their fields to "extra" thats makes log structured and clear as it conceived in Graylog
PR -- thats uses existing function UnmarshalJSON and works for me
Mb, there are some perfomance issues from additional parse
This library WriteMessage method is broken. Whenever I call it with any log level and a custom facility, messages are not delivered.
My code
graylogAddr := os.Getenv("GRAYLOG_ADDRESS")
gelfWriter, err := gelf.NewTCPWriter(graylogAddr)
if err != nil {
log.Fatalf("gelf.NewTCPWriter: %s", err)
}
gelfWriter.Facility = "test"
log.SetOutput(io.MultiWriter(os.Stderr, gelfWriter))
log.Printf("This message is delivered as expected to graylog2@'%s'", graylogAddr) // works
// following code loses message, nothing gets delivered to Graylog, with no error.
m := &gelf.Message{
Version: "1.1",
Short: "test message",
Full: "error details",
TimeUnix: float64(time.Now().Unix()),
Level: gelf.LOG_DEBUG,
Extra: map[string]interface{}{"test_arg": 4},
}
err := self.writer.WriteMessage(m)
if err == nil {
return true // works
}
Graylog version: 5.2 clean host, added System > Source, Debug level is set in System > Logging
We are currently only resolving the address of the GELF server when creating a new writer. This is problematic for long running processes because the IP address of the GELF server might change. Since this is UDP, the "connection" will never break and we will never notice.
To solve this, the IP address of the GELF server should be resolved again once the TTL of the DNS entry expires.
[This is in part a continuation of a discussion started in PR #9 where we were also worried about message length.]
The message
field (sent as short_message
in GELF) and the full_message
field are stored in the exact same way in Elasticsearch, so presumably they support the same sizes. I've actually tested message
only up to 80 KB (via a null-delimited "RAW/Plaintext" interface) and 2500 lines or so, however. :-) It works fine, and the Graylog UI seems to do a reasonable job of dealing with it. (I have seen a complaint that long fields don't work, but that appears to be because they changed the default configuration to do term analysis on the message
field, which isn't done by default.)
Having looked into this a bit, I find:
full_message
is optional to send in GELF, unlike message
.
full_message
is not parsed for things like sources and timestamps.
On its syslog inputs, Graylog has a store_full_message: true
option which stores the raw syslog protocol message (usually RFC 5424 in the full_message
field. This seems useful for debugging and handling situations where bad messages come in that the parser can't handle. We turned this on when we set up Graylog because it seemed like a good idea at the time, and it still doesn't strike me as a bad thing to do.
Graylog search functionality seems to be oriented mainly towards message
and it seems to take a bit of extra work to search full_message
instead.
From all this, I get the sense that full_message
is considered something that can be safely ignored by clients retrieving and processing messages from Graylog. Given that, I think it's a bad idea, when given a message via the io.Writer
interface, not to put the entire message into the message
field.
As background, let me tell you about my use of Graylog at work. We have dozens of servers generating log messages into the Systemd Journal that we forward to a central Graylog installation so we can easily query across servers, do centralized analysis (such as with OSSEC LIDS log analysis), keep historical records, and so on.
In order to keep things as simple as possible, I'm aiming to get all our non-journal logging also going into the journal, including logs from programs that write directly to /var/log
such as auditd
, output of jobs run by cron
, and so on. (For example, I'm intending to replace /usr/sbin/sendmail
on those servers with a program that eats the message and logs it to the journal.)
One of the questions I had to ask myself is whether I should be using the full_message
field, and what with various programs partially duplicating information across message
and full_message
, that seems like a bad idea. How would the OSSEC log scanner, for example, know that for some log entries full_message
is all the information in message
and a few other fields in unparsed form, yet for other messages message
could be an incomplete log message and full_message
contains the complete message but without other log entry information such as the timestamp? That way lies a lot of pain, especially if I end up having to assemble new versions of log entries for log-scanning software that expects things to be re-assembled in a syslog-ish format with the timestamp, source etc. all in one big string.
So for these reasons, I'm suggesting we change the io.Writer
API, Write([]byte)
, of go-gelf
to record the entire bytestring it's handed in the message
field and not send a full_message
field. I think it's reasonable for it to continue adding things like the extra fields for file and line number of the caller.
Note that if anybody does want to use the full_message
field for something, or any other fields for that matter, we have the Write(*Message)
interface for that which allows (or should allow) full control over the GELF message being sent, within the limitations of the protocol. (We should not allow, e.g., a GELF message to be sent without a message
field since that field is required by the protocol.)
If we're agreed on this, I'm happy to submit a PR for changes to the code and update the documentation appropriately, including this rationale in an appropriate place (probably the commit message).
Hello,
It would be very nice to have support for TLS TCP inputs. I'm not sure what the status of the tls
branch is.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.