Giter VIP home page Giter VIP logo

logstash-filter-grok's People

Contributors

7lima avatar andrewvc avatar andsel avatar colinsurprenant avatar debadair avatar dedemorton avatar electrical avatar elyscape avatar guyboertje avatar jakelandis avatar johananl avatar jordansissel avatar jsvd avatar kaisecheng avatar karenzone avatar kares avatar lucabelluccini avatar original-brownbear avatar ph avatar robbavey avatar robin13 avatar suyograo avatar szabosteve avatar wallrik avatar wiibaa avatar yaauie avatar ycombinator avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

logstash-filter-grok's Issues

Why is this filter "slow"? A brief analysis

Total time to process the same apache log line 1_000_000 times with %{COMMONAPACHELOG}: 93.88s (~10k e/s)

The 6 biggest time offenders use 70.46s - 75% of total time:

Action Time (s) % of Total Times called per event
Grok#handle (self) 21.16 22% 10x
Regexp#match 19.48 20% 1x
Event#[]= 15.58 16% 9x
Event#[] 8.51 9% 10x
Logger.debug? 5.73 6% 3x
Proc#call 4.54 4% 11x

Note: this is using:

  • logstash-core master;
  • logstash-core-event-java master
  • logstash-filter-grok master

Ruby script used to test:

# encoding: utf-8
require 'jruby/profiler'

module LogStash
end

module LogStash::Environment
  # running the grok code outside a logstash package means
  # LOGSTASH_HOME will not be defined, so let's set it here
  # before requiring the grok filter
  unless self.const_defined?(:LOGSTASH_HOME)
    LOGSTASH_HOME = File.expand_path("../../../", __FILE__)
  end

  # also :pattern_path method must exist so we define it too
  unless self.method_defined?(:pattern_path)
    def pattern_path(path)
      ::File.join(LOGSTASH_HOME, "patterns", path)
    end
  end
end

require 'logstash/event'
require 'logstash/environment'
require 'logstash/filters/grok'
require 'logstash/codecs/base'

filter = LogStash::Filters::Grok.new(
  "match" => { "message" => "%{COMMONAPACHELOG}" },
)
filter.register

print "generating data.."
data = 1_000_000.times.map do
  message = '198.46.149.143 - - [04/Jun/2015:02:29:31 +0000] "GET /blog/geekery/solving-good-or-bad-problems.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+semicomplete%2Fmain+%28semicomplete.com+-+Jordan+Sissel%29 HTTP/1.1" 200 10756 "-" "Tiny Tiny RSS/1.11 (http://tt-rss.org/)"'
  LogStash::Event.new("message" => message)
end
puts "done. benchmarking..."

profile_data = JRuby::Profiler.profile do
  data.each do |event|
    filter.filter(event)
  end
end
profile_printer = JRuby::Profiler::GraphProfilePrinter.new(profile_data)
profile_printer.printProfile(STDOUT)

The full profile.graph output can be found here

IP pattern is very slow because of the IPV6 regexp

test script:

# encoding: utf-8
require 'logstash/event'
require 'logstash/environment'
require 'spec/filters/grok_spec'

grok_base = LogStash::Filters::Grok.new(
  "match" => ["message", '%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}'])

grok_fast = LogStash::Filters::Grok.new(
  "match" => ["message", '%{IPV4ORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}'])

grok_base.register
grok_fast.register

def benchmark(filter)
  t = Time.now
  total = 0
  File.open("logs", "r") do |file|
    file.each_line("logs") do |line|
      event = LogStash::Event.new("message" => line)
      filter.filter(event)
      bytes = event["[bytes]"]
      total += bytes if bytes
    end
  end
  puts Time.now - t
  puts total
end

puts "warmup base"
benchmark(grok_base)

puts "benchmark base"
benchmark(grok_base)

puts "warmup fast"
benchmark(grok_fast)

puts "benchmark fast"
benchmark(grok_fast)

where IPV4ORHOST is IPV4ORHOST (?:%{IPV4}|%{HOSTNAME})

result:

% bundle exec ruby -J-Xmx6g bench.rb
warmup base
55.881
13026527862
benchmark base
54.502
13026527862
warmup fast
28.809
13026527862
benchmark fast
28.765
13026527862

This means that if you know you don't have ipv6 addresses you can get twice the throughput

Grok matching syslog

(This issue was originally filed by @luongvinhthao at elastic/logstash#1846)


I used Rsyslog send to logstash. I write grok like match => { "message" => " %{GREEDYDATA:OS_message}" }.That i get OS_message : controller 2014-10-06 12:27:47.536 1142 WARNING keystone.common.controller [-] RBAC: Bypassing authorization
But when i change match => { "message" => " %{HOSTNAME:hostname} %{GREEDYDATA:OS_message}" }. I get hostname is 00.So,I try to test with grokdebug, i get hostname controller. Anybody can help me explain it.

Strange behavior of grok pattern

(This issue was originally filed by @Fervid at elastic/logstash#2368)


Hello.

I wrote pattern to parse follow ISO8601 timestamp: 2015-01-15 06:33:09 +0000
I am going to use (?x) mode, that's why I use explicit SPACE.

pattern file:

SPACE \s
#FIRST CASE
#ISO8601_TIMESTAMP   %{ISO8601_DATE}  (?: [tT] | %{SPACE})   %{ISO8601_TIME}  %{SPACE}  (?: %{ISO8601_TZD_CODE:start_tzd_code} | %{ISO8601_TZD_SIGN:sign} %{HOUR:start_tzd_hour} (?: :? %{MINUTE:start_tzd_minute})?)
#SECOND CASE
#ISO8601_TIMESTAMP   %{ISO8601_DATE}  (?: [tT] | %{SPACE})   %{ISO8601_TIME}  %{SPACE}  %{ISO8601_TIMEZONE}
ISO8601_DATE        %{YEAR:start_year} \- %{MONTHNUM:start_month} \- %{MONTHDAY:start_day}
ISO8601_TIME        %{HOUR:start_hour}  :?  %{MINUTE:start_minute}  (?: :? %{SECOND:start_second})?
ISO8601_TZD_SIGN    [+-]
ISO8601_TZD_CODE    [zZ]
ISO8601_TIMEZONE    (?: %{ISO8601_TZD_CODE:start_tzd_code} | %{ISO8601_TZD_SIGN:sign} %{HOUR:start_tzd_hour} (?: :? %{MINUTE:start_tzd_minute})?)

I have two cases, all they marked them by comment. These two cases behave differently!!!
FIRST CASE:

/opt/logstash-1.4.2/bin/logstash -e 'input {stdin {}} filter{ grok { match =\> [ "message", "(?x)%{ISO8601_TIMESTAMP}" ] }} output { stdout { codec => rubydebug }}'
2015-01-15 06:33:09 +0000
{
             "message" => "2015-01-15 06:33:09 +0000",
            "@version" => "1",
          "@timestamp" => "2015-01-18T12:16:01.108Z",
                "host" => "alerts-db",
          "start_year" => "2015",
         "start_month" => "01",
           "start_day" => "15",
          "start_hour" => "06",
        "start_minute" => "33",
        "start_second" => "09",
                "sign" => "+",
      "start_tzd_hour" => "00",
    "start_tzd_minute" => "00"
}

SECOND CASE:

/opt/logstash-1.4.2/bin/logstash -e 'input {stdin {}} filter{ grok { match =\> [ "message", "(?x)%{ISO8601_TIMESTAMP}" ] }} output { stdout { codec => rubydebug }}'
2015-01-15 06:33:09 +0000
{
         "message" => "2015-01-15 06:33:09 +0000",
        "@version" => "1",
      "@timestamp" => "2015-01-18T12:17:09.488Z",
            "host" => "alerts-db",
      "start_year" => "2015",
     "start_month" => "01",
       "start_day" => "15",
      "start_hour" => "06",
    "start_minute" => "33",
    "start_second" => "09"
}

Why in second case I don't get

                "sign" => "+",
      "start_tzd_hour" => "00",
    "start_tzd_minute" => "00"
?

Where is my mistake?

Feature Request: Recursively Grok lines and streams

I'm looking to use grok to parse through lines and streams of data. I'll explain how.

Let's say I have a line of data of:

221.37.88.36.bc.googleusercontent.com,63.88.73.122,63.88.73.0,-,,-,Google Inc.,Mountain View,CA,US,Google Inc.,Mountain View,CA,US

We can see there are some noticeable information in there, such as IP addresses, hostnames, city, state, country.

I'm trying to make a grok parser to extract data out of this line incrementally, where through each grok filter it will remove what was parsed out and feed the remainder into the next grok filter.

For example:

Let's take an input from a TCP port

input {
   tcp { port => "4382" }
}

And feed it through grok

filter {
    # GROK PARSER 01
    grok {
        match => { "message", "%{HOSTNAME:Hostname}" }    # This will parse out all hostnames from the line
    }
    # GROK PARSER 02
    grok {
        match => { "message", "%{IPV4:IP}" }    # This will parse out all IPV4 address from the line
    }
    # GROK PARSER 03
    grok {
        match => { "message", "%{GREEDYDATA:data}" }    # This will encapsulate the rest of the information
    }
}

After GROK PARSER 01, we'll end up parsing out anything that is a hostname

            "message" => [
        [0] "221.37.88.36.bc.googleusercontent.com,63.88.73.122,63.88.73.0,-,,-,Google Inc.,Mountain View,CA,US,Google Inc.,Mountain View,CA,US"
    ],
           "@version" => "1",
         "@timestamp" => "2015-07-15T19:33:46.261Z",
           "Hostname" => "221.37.88.36.bc.googleusercontent.com"
}

Then, GROK PARSER 02, will parse any IPV4 address

            "message" => [
        [0] ",63.88.73.122,63.88.73.0,-,,-,Google Inc.,Mountain View,CA,US,Google Inc.,Mountain View,CA,US"
    ],
           "@version" => "1",
         "@timestamp" => "2015-07-15T19:33:46.261Z",
                 "IP" => [
        [0] "63.88.73.122",
        [1] "63.88.73.0"
    ]
}

And, lastly, GROK PARSER 03, will hold what's left

            "message" => [
        [0] ",,,-,,-,Google Inc.,Mountain View,CA,US,Google Inc.,Mountain View,CA,US"
    ],
           "@version" => "1",
         "@timestamp" => "2015-07-15T19:33:46.261Z",
               "data" => ",,,-,,-,Google Inc.,Mountain View,CA,US,Google Inc.,Mountain View,CA,US"
}

How can we make this happen?

tests fail because LogStash::Environment.pattern_path is not in logstash-core

running the specs fail for logstash-core >= 2.2

/tmp/logstash-filter-grok (git)-[master] % bundle exec rspec
Using Accessor#strict_set for specs
NoMethodError: undefined method `pattern_path' for LogStash::Environment:Module
             Grok at /private/tmp/logstash-filter-grok/lib/logstash/filters/grok.rb:226
           (root) at /private/tmp/logstash-filter-grok/lib/logstash/filters/grok.rb:139
          require at org/jruby/RubyKernel.java:1040
          require at /Users/joaoduarte/.rvm/gems/jruby-1.7.23/gems/polyglot-0.3.5/lib/polyglot.rb:65
           (root) at /private/tmp/logstash-filter-grok/spec/filters/grok_spec.rb:1
             load at org/jruby/RubyKernel.java:1059
           (root) at /private/tmp/logstash-filter-grok/spec/filters/grok_spec.rb:11
             each at org/jruby/RubyArray.java:1613
           (root) at /Users/joaoduarte/.rvm/gems/jruby-1.7.23/gems/rspec-core-3.1.7/lib/rspec/core/configuration.rb:1
  load_spec_files at /Users/joaoduarte/.rvm/gems/jruby-1.7.23/gems/rspec-core-3.1.7/lib/rspec/core/configuration.rb:1105
  load_spec_files at /Users/joaoduarte/.rvm/gems/jruby-1.7.23/gems/rspec-core-3.1.7/lib/rspec/core/configuration.rb:1105
            setup at /Users/joaoduarte/.rvm/gems/jruby-1.7.23/gems/rspec-core-3.1.7/lib/rspec/core/runner.rb:96
              run at /Users/joaoduarte/.rvm/gems/jruby-1.7.23/gems/rspec-core-3.1.7/lib/rspec/core/runner.rb:84
              run at /Users/joaoduarte/.rvm/gems/jruby-1.7.23/gems/rspec-core-3.1.7/lib/rspec/core/runner.rb:69
             load at org/jruby/RubyKernel.java:1059
           invoke at /Users/joaoduarte/.rvm/gems/jruby-1.7.23/gems/rspec-core-3.1.7/lib/rspec/core/runner.rb:37
             eval at org/jruby/RubyKernel.java:1079
           (root) at /Users/joaoduarte/.rvm/gems/jruby-1.7.23/bin/jruby_executable_hooks:15

this happens because LogStash::Environment.pattern_path is not defined in logstash-core gem but instead in the boostrapping code

grok should ignore tilde backup files when processing patterns_dir

(This issue was originally filed by @mrec at elastic/logstash#2271)


(This comes from the discussion of #2244)

When testing a config using grok and custom patterns, a user will often be editing pattern definition files in patterns_dir between run attempts. Many (most?) Linux-ey text editors create backup files, named as the original filename plus a ~ suffix, in the same location as the original; even though they aren't hidden these are often invisible by default in file browsers. When dealing with multiple pattern definition files, and especially when renaming them, it's possible to have a lot of these tilde files lying around after a while.

grok currently reads everything in patterns_dir, including any tilde backups. It quite reasonably doesn't define the order in which it reads them, and it doesn't warn if e.g. the definition of MYPATTERN in a stale patterns~ or previousfilename~ backup file overrides the definition of MYPATTERN in patterns. Hilarity ensues. Also hair-tearing, teeth-gnashing, bad language and various other undesirable outcomes.

I propose that grok should ignore any files in patterns_dir ending in a ~. There may be other things it'd be beneficial to blacklist too, but this seems like a good start.

The 'tag_on_failure' attribute doesn't interpolate template variables

This would be a useful feature if the tag_on_faliure attribute would interpolate dynamic template variables. For example, given a template variable type = syslog:

       grok {
          match => [
            "message", "%{SYSLOGBASE:syslog_data}\s+%{GREEDYDATA:message}"
          ]
          overwrite => [ "message" ]
          tag_on_failure => ["_grokparsefailure_%{type}"]
        }

Should tag the event with _grokparsefailure_syslog but is currently showing up as _grokparsefailure_%{type}

add_tag cause incorrect behaviour when using multipatterns

using logstash 2.0

OK

 grok{
                #failure
                            match => { "message" => ["%{SSH_AUTH_1}","%{SSH_AUTH_2}","%{SSH_AUTH_3}"] }
                            patterns_dir => "/opt/elk/PRODSEC/FS/conf/logstash/patterns"

     }

message:Failed password for invalid user tony from 192.168.1.35 port 53652 ssh2 @version:1 @timestamp:November 26th 2015, 15:08:01.000 beat.hostname:w530 beat.name:w530 count:1 fields: - input_type:log offset:178,307 source:/var/log/auth.log type:auth timestamp:Nov 26 15:08:01 logsource:w530 program:sshd pid:20280 user:tony src_ip:192.168.1.35 src_port:53652 auth_type:ssh2 _id:AVFEH3K0kV3T0hSZq2e0 _type:auth _index:logstash-auth-2015.11.26 _score:

BUG (_grokparsefailure is added as a tag despite correct pattern matching)

 grok{
                #failure
                            match => { "message" => ["%{SSH_AUTH_1}","%{SSH_AUTH_2}","%{SSH_AUTH_3}"] }
                            patterns_dir => "/opt/elk/PRODSEC/FS/conf/logstash/patterns"
                            add_tag => ["auth_fail"]
     }

message:Failed password for invalid user tony from 192.168.1.35 port 55531 ssh2 @version:1 @timestamp:November 26th 2015, 16:12:44.000 beat.hostname:w530 beat.name:w530 count:1 fields: - input_type:log offset:182,004 source:/var/log/auth.log type:auth timestamp:Nov 26 16:12:44 logsource:w530 program:sshd pid:811 user:tony src_ip:192.168.1.35 src_port:55531 auth_type:ssh2 tags:auth_fail, _grokparsefailure _id:AVFEWrQzkV3T0hSZrRn1 _type:auth _index:logstash-unparsed-2015.11.26 _score:

Maybe support date parsing?

Often users are confused that matching a timestamp text doesn't actually inform logstash as to the correct time the event occurred.

Maybe something like %{MYTIMEPATTERN:+time_format} would imply capturing the text and parsing it with time_format and setting @timestamp?

Anyone have thoughts?

Parse Failure slows down entire agent

(This issue was originally filed by @galvinograd at elastic/logstash#1678)


When logstash fails to parse a large log file with a large grok parse template, it slows down the entire agent's performance significantly. On the other hand, when the parsing succeeds, logstash is orders of magnitude faster.
Maybe it has something to do with exception handling?

Creating a Field from Multiple Default Grok Patterns.

I'm looking to use multiple grok patterns that, when matched, will be the value for a new field.

For example, here's a log string that I'm working with:

8/19/2014 18:53,6/16/2015 4:21

I want:

"firstSeen" => "8/19/2014 18:53"
"lastSeen" => "6/16/2015 4:21"

Since there isn't an existing grok pattern to match 8/19/2014 18:53 as a timestamp, I can still match it by using individual portions of the existing grok patterns library, such as %{DATE} %{HOUR} %{MINUTE}, but grok doesn't work that way:

grok {
   match =>  [ "message", "%{DATE:firstSeen} %{HOUR:firstSeen}:%{MINUTE:firstSeen}" ]
}

However, I'd like the firstSeen tag to be a concatenation of %{DATE} %{HOUR}:%{MINUTE}

I'm looking to do something like this:

%{(%{DATE} %{HOUR}:%{MINUTE}):firstSeen}

Is that possible?

Edit: I know about the date filter, but there are two timestamps inside the message that needs to be parsed out; and for visualizations, those two fields are needed.

Edit 2: I'm wondering if this will work:

(?<firstSeen>%{DATE} %{HOUR}:%{MINUTE})

Edit 3: I ran the pattern through http://grokconstructor.appspot.com/do/match and it came out matched. This may just work. If it does, a-w-e-s-o-m-e.

Extract URL from Text Field into a New Field Called URL

I'm inputting a field called text. this field may at times contains a URL.What I would like to do is extract the URL's from text, and put them in a new field called URL.

I tried grok, but it seems like grok patterns need a specific log format in order for it to work. For an example, the following will work:

          5546 hello www.google.com
          {id} {text} {URL}

But the following wouldn't

          4324 hello my name is Ryan www.yahoo.com
          {id} {text} {URL}

instead, it would take hello as text, and not take www.yahoo.com as the URL. Is there a way around this? Please note that sometimes, the text might look like the following:

          www.gmail.com hello everyone

Shouldn't grok be able to extract any pattern from a field regardless its place in the filed? Could this be an enhancement or is it not possible?

push 0.1.7

head of logstash is blowing up with Bundler::VersionConflict: Could not find gem 'logstash (< 2.0.0, >= 1.4.0) java', which is required by gem 'logstash-filter-grok (>= 0) java', in any of the sources.

grok applies tag_on_failure if last pattern doesn't match

If you have a grok filter with multiple matches and break_on_match set to false, the event will have tag_on_failure applied unless the last pattern matches. This is a change from 1.4.

filter{
    grok {
        match => { "message" => [
            "foo",
            "bar"
         ] }
        tag_on_failure => [ "failure" ]
        break_on_match => false
    }
}

Sending in input of:

foo
bar
spam

Gives:

{
   "message" => "foo",
  "@version" => "1",
"@timestamp" => "2015-07-06T22:40:00.454Z",
      "host" => "0.0.0.0",
      "tags" => [
    [0] "failure"
]
}
{
   "message" => "bar",
  "@version" => "1",
"@timestamp" => "2015-07-06T22:40:00.456Z",
      "host" => "0.0.0.0"
}

{
   "message" => "spam",
  "@version" => "1",
"@timestamp" => "2015-07-06T22:40:00.627Z",
      "host" => "0.0.0.0",
      "tags" => [
    [0] "failure"
]
}

In 1.4, only "spam" would be tagged as "failure".

How to dump final grok regexes?

(This issue was originally filed by @PAStheLoD at elastic/logstash#2049)


It'd be very handy to get the resulting PCRE expressions for filter development and debugging purposes. (Since there are quite friendly online tools for interacting with regex as opposed to the slow trial-and-error restart-driven development.)

Thanks!

Exception in Grok Plugin

I have confirmed this is a problem in logstash-2.1.0 as well.

I am using logstash-1.4.2 and found a grok exception happening for some of our training and testing files.
Can anyone please confirm if the fix is correct? And why exactly the exception happens?

I found the following:

The function match in the file logstash-1.4.2/lib/logstash/filters/grok.rb needs to be changed. The change is highlighted below. I basically added a return false in the rescue exception handler, and some logging informatitio The exception seems to happen for .* expression probably because of increased state machine, but I need to investigate this further. I think (although I am not sure) this is only happening for unmatched patterns. For now we can take the default option of "unmatched" for any exception happening in logstash. Hopefully these exceptions will be uncommon.

One additional thing to note is that the same line can throw multiple exceptions. I confirmed this by looking at the debug output information I added in the exception handler.

private
def match(grok, field, event)
input = event[field]
if input.is_a?(Array)
success = true
input.each do |input|
grok, match = grok.match(input)
if match
match.each_capture do |capture, value|
handle(capture, value, event)
end
else
success = false
end
end
return success
#elsif input.is_a?(String)
else
# Convert anything else to string (number, hash, etc)
grok, match = grok.match(input.to_s)
return false if !match

  match.each_capture do |capture, value|
    handle(capture, value, event)
  end
  return true
end

rescue StandardError => e
@logger.warn('------')
@logger.warn("Grok regexp threw exception", :exception => e.message)
@logger.warn(' The input is ')
@logger.warn(input)
@logger.warn('------')
return false
end

Windows test failure

From elastic/logstash#2487

  39) LogStash::Filters::Grok break_on_match = true (default) for array input with multiple grok pattern "{"message":["hello world","line 23"]}" when processed
     Failure/Error: Unable to find matching line from backtrace
     Insist::Failure:
       Expected "hello", but got nil
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:57:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:112:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:170:in `run'

  40) LogStash::Filters::Grok singles with duplicate-named fields "hello world" when processed
     Failure/Error: Unable to find matching line from backtrace
     Insist::Failure:
       NilClass is not a String
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:57:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:112:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:170:in `run'

  41) LogStash::Filters::Grok empty fields drop by default "1=test" when processed
     Failure/Error: Unable to find matching line from backtrace
     Insist::Failure:
       Expected nil, got ["_grokparsefailure"]
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:57:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:112:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:170:in `run'

  42) LogStash::Filters::Grok empty fields keep if keep_empty_captures is true "1=test" when processed
     Failure/Error: Unable to find matching line from backtrace
     Insist::Failure:
       Expected nil, got ["_grokparsefailure"]
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:57:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:112:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:170:in `run'

  43) LogStash::Filters::Grok processing selected fields "{"message":"hello world","examplefield":"12345"}" when processed
     Failure/Error: Unable to find matching line from backtrace
     Insist::Failure:
       Expected "hello", but got nil
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:57:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:112:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:170:in `run'

  44) LogStash::Filters::Grok allow dashes in capture names "hello world" when processed
     Failure/Error: Unable to find matching line from backtrace
     Insist::Failure:
       Expected "hello", but got nil
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:57:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:112:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:170:in `run'

  45) LogStash::Filters::Grok when named_captures_only == false "Hello World, yo!" when processed
     Failure/Error: Unable to find matching line from backtrace
     Insist::Failure:
       Expected "WORD" in #<LogStash::Event:0x123c8f2 @metadata={}, @accessors=#<LogStash::Util::Accessors:0x5da00b @store={"message"=>"Hello World, yo!", "@version"=>"1", "@timestamp"=>"2015-01-30T22:12:55.202Z", "tags"=>["_grokparsefailure"]}, @lut={"message"=>[{"message"=>"Hello World, yo!", "@version"=>"1", "@timestamp"=>"2015-01-30T22:12:55.202Z", "tags"=>["_grokparsefailure"]}, "message"], "tags"=>[{"message"=>"Hello World, yo!", "@version"=>"1", "@timestamp"=>"2015-01-30T22:12:55.202Z", "tags"=>["_grokparsefailure"]}, "tags"], "WORD"=>[{"message"=>"Hello World, yo!", "@version"=>"1", "@timestamp"=>"2015-01-30T22:12:55.202Z", "tags"=>["_grokparsefailure"]}, "WORD"]}>, @logger=#<Cabin::Channel:0x16169da @subscriber_lock=#<Mutex:0x6ca36a>, @data={}, @metrics=#<Cabin::Metrics:0x5cac5f @channel=#<Cabin::Channel:0x16169da ...>, @metrics={}, @metrics_lock=#<Mutex:0x38b0f5>>, @subscribers={}, @level=:info>, @data={"message"=>"Hello World, yo!", "@version"=>"1", "@timestamp"=>"2015-01-30T22:12:55.202Z", "tags"=>["_grokparsefailure"]}, @metadata_accessors=#<LogStash::Util::Accessors:0xc46d2 @store={}, @lut={}>, @cancelled=false>
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:57:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:112:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:170:in `run'

  46) LogStash::Filters::Grok patterns in custom dir override those in 'patterns/' dir "{"message":"0"}" when processed
     Failure/Error: Unable to find matching line from backtrace
     Errno::EACCES:
       Permission denied - C:/Users/jls/AppData/Local/Temp/d20150130-1484-3g3zk8/grok20150130-1484-14bwd5b
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:57:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:112:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:170:in `run'

  47) LogStash::Filters::Grok break_on_match = false for array input with multiple grok pattern "{"message":["hello world 123","line 23"]}" when processed
     Failure/Error: Unable to find matching line from backtrace
     Insist::Failure:
       Expected ["hello", "line"], but got nil
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:57:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:112:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:170:in `run'

  48) LogStash::Filters::Grok break_on_match = false for array input with multiple grok pattern "{"message":["hello world","line 23"]}" when processed
     Failure/Error: Unable to find matching line from backtrace
     Insist::Failure:
       Expected ["hello", "line"], but got nil
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:57:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:112:in `run'
     # C:\Users\jls\Documents\GitHub\logstash\lib\logstash\runner.rb:170:in `run'

Filtering on array throws exception

I'm using logstash 1.4.2.

I'm joining two log messages

Feb 16 13:32:20 dhcp-ahof dhcpd: DHCPID c47e5a92
Feb 16 13:32:20 dhcp-ahof dhcpd: DHCPDISCOVER from 54:52:00:42:7b:fb (Some computer) via 192.168.42.1

using multiline:

filter {
  if [type] == "syslog" {
      # get transaction ids
      multiline {
        pattern => "DHCPID"
        what => "next"
      }
  }
}

After that I try to grok the resulting event.
For example tagging it:

filter {
  if [type] == "syslog" {
    if [syslog_program] == "dhcpd" {
      grok {
        break_on_match => true
          match => { "syslog_message" => "DHCPDISCOVER" }
          add_tag => "dhcp"
          add_tag => "dhcp-discover"
          tag_on_failure => []
      }
    }
  }
}

This results in the following filter result (all following grok filters match):

Feb 16 13:32:20 dhcp-ahof dhcpd: DHCPID c47e5a92
Feb 16 13:32:20 dhcp-ahof dhcpd: DHCPDISCOVER from 54:52:00:42:7b:fb (Some computer) via 192.168.42.1
offset      {
                 "message" => "Feb 16 13:32:20 dhcp-ahof dhcpd: DHCPID c47e5a92\nFeb 16 13:32:20 dhcp-ahof dhcpd: DHCPDISCOVER from 54:52:00:42:7b:fb (Some computer) via 192.168.42.1",
                "@version" => "1",
              "@timestamp" => "2015-02-16T13:09:55.081Z",
                    "type" => "syslog",
                    "host" => "noc-log",
        "syslog_timestamp" => "Feb 16 13:32:20",
         "syslog_hostname" => "dhcp-server1",
          "syslog_program" => "dhcpd",
          "syslog_message" => [
        [0] "DHCPID c47e5a92",
        [1] "DHCPDISCOVER from 54:52:00:42:7b:fb (Some computer) via 192.168.42.1"
    ],
             "received_at" => [
        [0] "2015-02-16 13:09:55 UTC",
        [1] "2015-02-16 13:09:59 UTC"
    ],
           "received_from" => "noc-log",
    "syslog_severity_code" => 5,
    "syslog_facility_code" => 1,
         "syslog_facility" => "user-level",
         "syslog_severity" => "notice",
                    "tags" => [
        [ 0] "multiline",
        [ 1] "dhcp",
        [ 2] "dhcp-discover",
        [ 3] "dhcp",
        [ 4] "dhcp-offer",
        [ 5] "dhcp",
        [ 6] "dhcp-request",
        [ 7] "dhcp",
        [ 8] "dhcp-acknowledge",
        [ 9] "dhcp",
        [10] "dhcp-nak",
        [11] "dhcp",
        [12] "dhcp-decline",
        [13] "dhcp",
        [14] "dhcp-release",
        [15] "dhcp",
        [16] "dhcp-inform"
    ]
}

Which seems to be caused by grok throwing an exception, when parsing the array field "syslog_message":

{:timestamp=>"2015-02-16T14:09:59.950000+0100", :message=>"Grok regexp threw exception", :exception=>"undefined method `match' for false:FalseClass", :level=>:warn, :file=>"logstash/filters/grok.rb", :line=>"331"}
{:timestamp=>"2015-02-16T14:09:59.952000+0100", :message=>"filters/LogStash::Filters::Grok: adding tag", :tag=>"dhcp", :level=>:debug, :file=>"logstash/filters/base.rb", :line=>"182"}
{:timestamp=>"2015-02-16T14:09:59.953000+0100", :message=>"filters/LogStash::Filters::Grok: adding tag", :tag=>"dhcp-discover", :level=>:debug, :file=>"logstash/filters/base.rb", :line=>"182"}

This seems to be related to https://logstash.jira.com/browse/LOGSTASH-1710.

Grok: Maybe allow defining the type *in* the pattern definition

(This issue was originally filed by @jordansissel at elastic/logstash#1859)


Problem: Many users do things like %{NUMBER:bytes} in grok and then are confused why Elasticsearch fails to do statistics or other numeric aggregations on it. The cause is that Grok only does strings by default and Elasticsearch is sent a string and maps 'bytes' to a string - and this is confusing.

I'm tired of users tripping over this problem. I would be willing to add a feature to grok that allowed you to define the 'type' of a pattern inside the pattern definition.

Background: In a grok patterns file, you can define a pattern with NAME PATTERN syntax (name of pattern, space, the regexp pattern).

Proposal: Allow the type to accompany the NAME.

By way of example, if we were to fix this NUMBER problem permanently, we would define the new pattern like this:

NUMBER:float (%{BASE10NUM})

The new syntax is NAME:TYPE REGEXP and is backwards-compatible with the old syntax (The :TYPE is made optional and defaults to string if not provided).

This would allow us to more reasonably define the patterns with their respective types such that this will be captured as a numeric type in Elasticsearch: %{NUMBER:bytes}

It's not clear if this will solve everything, though, since in some cases like 'bytes' the value is never fractional, so users doing %{NUMBER:bytes} and seeing a float may be confused because they wanted to see a long type in Elasticsearch.

Thoughts?

Max grok pattern size

We have a logstash grok filter with a single match => {message => '%{PATTERN}'}, where PATTERN is made out of several other patterns joined with | (i.e. a grok file with PATTERN %{PAT1}|%{PAT2}; and each sub-pattern also a combination of more patterns).

recently, we added an new pattern to the joined list, and logstash starts to consume large amounts cpu after a while (like 30 minutes or so, and it parsed a few messages with the new pattern, so the new pattern itself seems fine).

but maybe we hit some internal threshold/buffersize/.... is there a limit to the size of a single pattern in the match => message? we could split the patterns and use match => { message => ['PAT1', 'PAT2', ...] }, but would it improve anything?

i also found #37, but i don't think it's related. the pattern does have a GREEDYDATA at the end, but because it is at the end, it shouldn't matter i think (the new pattern looks like uid:%{INT:uid:int} sid:%{INT:sid:int} tty:%{DATA:tty} cwd:%{UNIXPATH:cwd} filename:%{UNIXPATH:executable}: %{GREEDYDATA:command})

Plugin breaks configtest

With the current beta this happens on configtest:

grok must set a milestone. For more information about plugin milestones, see http://logstash.net/docs/1.5.0.beta1/plugin-milestones

Since the geoip plugin breaks too apparently logstash plugin update is currently not to be used, right?

Numeric semantic conversion in grok appear to work on longs as well

The following documentation snippet may be outdated/misleading.

https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

Optionally you can add a data type conversion to your grok pattern. By default all semantics are saved as strings. If you wish to convert a semantic’s data type, for example change a string to an integer then suffix it with the target data type. For example %{NUMBER:num:int} which converts the num semantic from a string to an integer. Currently the only supported conversions are int and float.

Turns out it actually works on values larger than int32 max (2147483647).

For example, the following works on a string 9223372036854775801. And via dynamic mapping will create the field as a long in Elasticsearch.

  grok {
    match => { "message" => "%{NUMBER:big_number:int}" }
  }

Add timeout option to grok filter

We have users sending in occasional large messages of many Mbs, and if they have not implemented logic in the pipeline to filter out these messages, the grok filter will run and peg the CPU threads for a very long time. It will be helpful to have a default (or configurable) timeout for grok filters so that it will give up at some point, etc..

"?:" meaning in grok

Hi,
"?:" does this have the same meaning as in regex or grok has special meaning? I found in logstash grok base pattern apache has

"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})"

what does "?:" mean here? I needed it to parse some custom apache logs and wanted to know "?:" importance.

grok Pattern in file, should trim ending spaces

(This issue was originally filed by @woodyyou at elastic/logstash#1843)


I put grok patterns in a file, didn't realized there was a space at the end. It resulted not match. Took me quite some time to figure out that's the problem (need to use rubydebug to tell there's a parsing error, even -v didn't tell).

Since leading spaces do not matter, ending space should not either. Also, I think it's a common mistake.

Unit test failure with some exceptions

Rspec test fail with this outcome

  logstash-filter-grok git:(master) bundle exec rspec
Using Accessor#strict_set for specs
Run options: exclude {:redis=>true, :socket=>true, :performance=>true, :couchdb=>true, :elasticsearch=>true, :elasticsearch_secure=>true, :broken=>true, :export_cypher=>true, :integration=>true, :windows=>true}
.......................
An error occurred in an after hook
  NoMethodError: undefined method `unlink' for nil:NilClass
  occurred at /Users/purbon/work/purbon/infra_tools/clones/logstash-filter-grok/spec/filters/grok_spec.rb:692:in `(root)'

F.......................................................
An error occurred in an after hook
  NoMethodError: undefined method `unlink' for nil:NilClass
  occurred at /Users/purbon/work/purbon/infra_tools/clones/logstash-filter-grok/spec/filters/grok_spec.rb:667:in `(root)'

F....

Failures:

  1) LogStash::Filters::Grok patterns in custom dir override those in 'patterns/' dir "{"message":"0"}" when processed
     Failure/Error: Unable to find matching line from backtrace
     IOError:
       No such file or directory
     # ./spec/filters/grok_spec.rb:679:in `(root)'

  2) LogStash::Filters::Grok patterns in the 'patterns/' dir override core patterns "{"message":"hello"}" when processed
     Failure/Error: Unable to find matching line from backtrace
     IOError:
       No such file or directory
     # ./spec/filters/grok_spec.rb:653:in `(root)'

Finished in 3.59 seconds
84 examples, 2 failures

Failed examples:

rspec /Users/purbon/.rvm/gems/jruby-1.7.16.1@filter-grok/gems/logstash-devutils-0.0.10-java/lib/logstash/devutils/rspec/logstash_helpers.rb:51 # LogStash::Filters::Grok patterns in custom dir override those in 'patterns/' dir "{"message":"0"}" when processed
rspec /Users/purbon/.rvm/gems/jruby-1.7.16.1@filter-grok/gems/logstash-devutils-0.0.10-java/lib/logstash/devutils/rspec/logstash_helpers.rb:51 # LogStash::Filters::Grok patterns in the 'patterns/' dir override core patterns "{"message":"hello"}" when processed

Randomized with seed 36915

More info on http://build-eu-00.elasticsearch.org/view/LS%20Filters/job/logstash_filter_grok_commit/338/console

Logstash significantly slows down with longer grok pattern

(This issue was originally filed by @rakesh91 at elastic/logstash#3094)


Hi,
I have been using logstash to parse my logs, lately I wanted to add few more fields to my logs. After updating my grok patterns for it logstash has significantly reduced its speed. I was getting around 3k output before the new pattern, now around 200 to 300.

I had around 20 fields back then, I increased 6 more.
What is the reason for this sudden performance degrade?

grok filter: How to match one pattern multiple times?

(This issue was originally filed by @stbka at elastic/logstash#2006)


I want to match one pattern multiple times in a logevent. Tried different regex pattern but I don't get it working.

Example-event:
This is a statusCode="ERROR_121" text to demonstrate my logevent statusCode="WARNING_2408" structure

What I want to have is a statusCode field with "ERROR_121" as well as "WARNING_2408".
Notice that it is possible that the event does not contain any statusCode.

My problem is that grok filter finds either just one entry and breaks or if I combine one pattern with a * it does not find anything.

Example-pattern:
STATUSCODE [a-zA-Z0-9_-]+
STATUSCODEENTRY statusCode=.%{STATUSCODE:statusCode}.
STATUSCODES (%{STATUSCODEENTRY}.+)*

I hope somebody can help me.

Thanks.

Passing an array of patterns to grok does not match the second pattern (of two).

The new hash syntax for 'match' can take an array of patterns, but if two patterns are provided, the second is not matched.

Here is a shell "one-liner" demonstrating the negative case:

echo 'banana 7' | /opt/logstash/bin/logstash -e '
input {
  stdin { }
}

filter {
  grok {
    match => {
      "message" => [
        "%{WORD:word}",
        "%{POSINT:number}"
      ]
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}
'
{
       "message" => "banana 7",
      "@version" => "1",
    "@timestamp" => "2015-06-09T04:28:55.602Z",
          "host" => "metrics.localdomain",
          "word" => "banana"
}

...and one for the positive case, using a different syntax:

echo 'banana 7' | /opt/logstash/bin/logstash -e '
input {
  stdin { }
}

filter {

  grok {
    match => {
      "message" => "%{WORD:word}"
    }
  }

  grok {
    match => {
      "message" => "%{POSINT:number}"
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}
'
{
       "message" => "banana 7",
      "@version" => "1",
    "@timestamp" => "2015-06-09T04:30:11.408Z",
          "host" => "metrics.localdomain",
          "word" => "banana",
        "number" => "7"
}

Old array syntax for match appears deprecated, but still documented.

The docs suggest that the match configuration option can still take an array, but the current code doesn't seem to like it.

Repro script:

echo 'banana 7' | /opt/logstash/bin/logstash -e '
input {
  stdin {}
}

filter {
  grok {
    match => [
      "message",
      "%{WORD:word}",
      "%{POSINT:number}"
    ]
  }
}

output {
  stdout {
    codec => rubydebug
  }
}
'

Output:

Invalid setting for grok filter plugin:

  filter {
    grok {
      # This setting must be a hash
      # This field must contain an even number of items, got 3
      match => ["message", "%{WORD:word}", "%{POSINT:number}"]
      ...
    }
  } {:level=>:error}

Add support of IP elasticsearch mapping type

I am using logstash 1.5.2
Currently I can use pattern %{NUMBER:response:int} and resulting elasticsearch document will have mapping
"response": {
"type": "long"
},
Instead of default mapping type string.

Documentation (https://www.elastic.co/guide/en/logstash/master/plugins-filters-grok.html) sad that "Currently the only supported conversions are int and float."

For IP addresses elasticsearch have special IP mapping type
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/mapping-ip-type.html

Can IP mapping type also be supported in grok filter for logstash ?
Its will be cool to use patterns like %{IPV4:IP:ip} and have documents with ip mapping type

grok overwrite not working for empty strings

migrates from elastic/logstash#2590

So when I have this input 

input="<14>2015-02-11T17:49:29Z logspout dev_ziservice_1[1]: ASDF"

  grok {      
    match => ["message", "<%{NUMBER}>%{TIMESTAMP_ISO8601:syslogTimestamp} %{SYSLOGHOST} %{DATA:container_name}(?:\[%{POSINT}\])?:%{SPACE}%{GREEDYDATA:message}"]
    overwrite => [ "message" ]
  }


message is = ASDF


if I have this input="<14>2015-02-11T17:49:29Z logspout dev_ziservice_1[1]: "
or this="<14>2015-02-11T17:49:29Z logspout dev_ziservice_1[1]:"

The message is not  " " or "" like displayed in http://grokdebug.herokuapp.com/

Multiple field patterns for message

(This issue was originally filed by @OlesyaShell at elastic/logstash#2286)


Hi!
Need to extract all fields from the events. Any fields that may not be or may be in a different order.
Examples:
<6> CEF:0|Stonesoft|IPS|5.2.6|271281|HTTP_SLS-Successful-Status-Code|1|spt=60494 deviceExternalId=IPS-1030 (moff) Sensor dmac=E4:C7:22:A4:17:E4 dst=192.168.0.1 requestMethod=GET cat=Protocol Information requestURL=www.ttttt.org app=HTTP rt=Dec 17 2014 20:18:49 act=Permit proto=6 dpt=80 src=172.16.0.1 dvc=172.16.0.1 dvchost=172.16.0.1 smac=90:E2:BA:19:A1:1B cs1Label=RuleId cs1=100.1

I try some configuration like at:
grok {
match => [
"message" => "dpt=%{INT:[dst][port]}"
"message" => "dst=%{IP:[dst][ip]}"
"message" => "src=%{IP:[src][ip]}"
"message" => "rt=%{STONEGATE_DATE:[event][time]}"
"message" => "deviceFacility=%{DATA}"
"message" => "src=%{IP:src_ip}"
"message" => "dst=%{IP:[dst][ip]}"
.....
]}
and other format:
grok {
match => {
"message" => [ "dpt=%{INT:[dst][port]}", "dst=%{IP:[dst][ip]}", "src=%{IP:[src][ip]}" ]
}}
with fail result.
What is the correct syntax?
I try this config with 1.4.2, logstash 2.0.0.dev, 1.5.0.beta1

i also create full string patterns for this type message, check on https://grokdebug.herokuapp.com and try parse events at:
grok #first try
{
break_on_match => false #except in 4.2.X
match => ["message", "patern_1_from_patterns"]
}
grok { #next try - NOT WORK
match => ["message", "patern_2_from_patterns"]
}
Thank you for help.

Refactor the tests

The test suite of this plugin would benefit a small refactor to dry up the code and use the stud gem when dealing with the temporary files and folder.

Logstash spikes cpu @100% on grok parse failure

(This issue was originally filed by @sujanks at elastic/logstash#2619)


Hi,

We have logstash 1.4.2 agents running consuming logs from SQS on ElasticSearch 1.4.2.

Every time we run the box sometimes it lasts for 2 mins to 5 mins then the cpu spikes to 100%, no matter how big boxes (EC2: m3Xlarge).

After spending lot of time, found out the it is due to the grok, but not clear why?

Following is our grok in the config file.

grok {
      match => [ "Message", "%{DATE:date} %{GREEDYDATA:time} \[%{GREEDYDATA:cloudhubapp}\] %{DATA:loglevel} %{DATA:application}\.%{DATA:component}\.%{DATA:subcomponent}\.%{DATA:position} - %{GREEDYDATA:keyvalue} Message=%{GREEDYDATA:message}" ]
}

These are the sample log message

2015-02-11 05:03:01,209 [[digital-methode-subscriber].connector.http.mule.default.receiver.2104] INFO  apache.component.content-status-notification-publisher-http-handler.other - transactionID=160246_8aba3078-b1a2-11e4-90ab-ef3fd79aaa94 Message=Content Status Notification message successfully.

2015-02-13 03:09:07,813 [[digital-methode-subscriber].connector.http.mule.default.receiver.31] ERROR org.apache.retry.notifiers.ConnectNotifier - Failed to connectreconnect: Work Descriptor. Root Exception was: One or more parameters are invalid. Reason: Message must be shorter than 262144 bytes.. Type: class com.amazonaws.AmazonServiceException

In the grok if I remove after hyphen (-), the logstash is ok and run on cpu about 20%, on EC2 m1.large.

Removed part

- %{GREEDYDATA:keyvalue} Message=%{GREEDYDATA:message}"

Any idea?

Sujan

filter inclusion "in" operator doesn't work with one element

Hi,

Running Logstash 2.2.1.

It looks like there is a problem with the "in" operator when the array being searched only contains one element. In the below example field "foo1" isn't added:

if ( "foo" in ["foo"] ) {                               
  mutate { add_field => ["foo1", "nope"] }              
}  

Here it works (field "foo2" is correctly added):

if ( "foo" in ["foo", "bar"] )  {                       
  mutate { add_field => ["foo2", "yep"] }               
}      

Am I missing something?

FEATURE : Recursive pattern for Grok

(This issue was originally filed by @M0dM at elastic/logstash#1934)


Hi,

I didn't arrived to use recursivity inside grok custom patterns.
I think this could be an awesome feature.

Benoit

Description :

Grok pattern matching the two following lines :

2014-07-11 18:26:21,335 - INFO  - 1712933>-<>-<text1>-<>-<text2

2014-07-11 18:26:21,335 - INFO  - 1712933>-<>-<text1>-<>-<text2>-<>-<text3

I want to match both of the lines and extract data like this :

%{CUSTOM_DATE}[\s-]*%{LOGLEVEL}[\s-]*%{POSINT}%{AMA_VALUES_LIST_DATA}

CUSTOM_DATE %{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND}
CUSTOM_VALUE ((?!>-<>-<).)*
CUSTOM_LIST_VALUE >-<>-<%{CUSTOM_VALUE}
CUSTOM_VALUES_LIST_COMPLEX %{CUSTOM_LIST_VALUE}%           {CUSTOM_LIST_VALUE_COMPLEX} | %{CUSTOM_LIST_VALUE}

What I would like to get :

     {
      "CUSTOM_DATE": [
        [
          "2014-07-11 18:26:21,335"
        ]
      ],
      "YEAR": [
        [
          "2014"
        ]
      ],
      "MONTHNUM": [
        [
          "07"
        ]
      ],
      "MONTHDAY": [
        [
          "11"
        ]
      ],
      "HOUR": [
        [
          "18"
        ]
      ],
      "MINUTE": [
        [
          "26"
        ]
      ],
      "SECOND": [
        [
          "21,335"
        ]
      ],
      "LOGLEVEL": [
        [
          "INFO"
        ]
      ],
      "POSINT": [
        [
          "1712933"
        ]
      ],
      "CUSTOM_LIST_COMPLEX": [
        [
          ">-<>-<text1>-<>-<text2>-<>-<text3"
        ]
      ],
      "CUSTOM_LIST_VALUE": [
        [
          ">-<>-<text1",
          ">-<>-<text2",
          ">-<>-<text3"
        ]
      ]
      "CUSTOM_VALUE": [
        [
          "text1",
          "text2",
          "text3"
        ]
      ]
    }

custom patterns - permissions denied

(This issue was originally filed by @mcgkev29 at elastic/logstash#1841)


I setup a custom pattern file in /opt/logstash/patterns/patterns/

when I restart the service logstash will not work/filter logs

when I review the logstash logs in /var/log/ I see this:
The error reported is: \n Permission denied - /opt/logstash/patterns/patterns"}

these are the permissions though:
drw-rw-rw- 2 logstash logstash 4096 Oct 3 10:02 patterns

here is my filter clause:
grok {
break_on_match => "true"
patterns_dir => "./patterns"

Grok assigns the wrong names to captures under some conditions

(This issue was originally filed by @TheFlimFlam at elastic/logstash#2072)


Description:
Composing grok patterns that share named captures will result in names to bind to the wrong capture in the context of the composition.

Reproduction steps:
Setup the following patterns file (/etc/logstash/patterns/general/test-patterns)

SSH_KEYFILE_ERROR (?<tags>error): (?<failure>Could not load host key): %{PATH:keyfile}
SSH_PASSWORD_FAIL (?<failure>Failed password) for %{USER:username} from %{IPORHOST:clientip} port %{INT:port} %{WORD:protocal}
AUTH_SSH          (%{SSH_KEYFILE_ERROR}|%{SSH_PASSWORD_FAIL})

Run the patterns file using the following logstash config

input {
    generator {
        count   => 1
        message => "Nov 14 14:50:23 puppet sshd[36930]: Failed password for magicaluser from 127.0.0.1 port 43333 ssh2"
    }
}

output {
    stdout { codec => "rubydebug" }
}

filter {
    grok {
        patterns_dir => '/etc/logstash/patterns/general/test-patterns'
        match => [ "message", "%{AUTH_SSH}" ]
    }
}

Will print the following to standard out:

{
       "message" => "Nov 14 14:50:23 puppet sshd[36930]: Failed password for magicaluser from 127.0.0.1 port 43333 ssh2",
      "@version" => "1",
    "@timestamp" => "2014-11-14T03:16:26.831Z",
          "host" => "1051a1523d6e",
      "sequence" => 0,
      "username" => "Failed password",
      "clientip" => "magicaluser",
      "protocal" => "43333"
}

Expected output:

  • A field should exist called failure which captures the text 'Failed password'
  • The field username should contain the text 'magicaluser'

Validation of match config

If by mistake the match config contains a hash instead of a valid string or array like this

grok {
  match => {
    a => { "b" => "c" }
  }
}

Logstash fails to start with a cryptic error can't convert Array into String
For a concrete example where a user can create this case and be lost, see https://discuss.elastic.co/t/filter-if-results-in-cant-convert-array-into-string-error/38805

Is there a mean for grok and this plugin to raise a better error message when the match config is invalid before jumping in config parser discussions.

Parse regex to a nested field

I'm trying to parse using grok filter with a regular expression.
I can store the result in a flat field:

(?<permission.user.read>[r-])

It gives:

        "permission.user.read" => "r",

But I would like to store the result in a nested structure such as:

permission:
    user: 
        read:  "r"

So I tried to common convention:

(?<[permission][user][read]>[r-])

But grok failed in that case:

The error reported is: 
  invalid char in group name <[permission][user][read]>: /(?<type>[d-])(?<[permission][user][read]>[r-])(?<permission.user.write>[w-])(?<permission.user.execute>[x-])(?<permission.group.read>[r-])(?<permission.group.write>[w-])(?<permission.group.execute>[x-])(?<permission.other.read>[r-])(?<permission.other.write>[w-])(?<permission.other.execute>[x-]) (?<INT:links>(?:[+-]?(?:[0-9]+))) (?<USERNAME:user>[a-zA-Z0-9._-]+) (?<USERNAME:group>[a-zA-Z0-9._-]+) (?:\s*)(?<NUMBER:size>(?:(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))) (?<TIMESTAMP_ISO8601:date>(?:(?>\d\d){1,2})-(?:(?:0?[1-9]|1[0-2]))-(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))[T ](?:(?:2[0123]|[01]?[0-9])):?(?:(?:[0-5][0-9]))(?::?(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))?(?:(?:Z|[+-](?:(?:2[0123]|[01]?[0-9]))(?::?(?:(?:[0-5][0-9])))))?) (?<NOTSPACE:timezone>\S+)(?<GREEDYDATA:name>.*)/m

If I do the same with a preregistered grok pattern, it works fine:

%{NUMBER:[metadata][size]}

gives:

    "metadata" => {
        "size" => "11"
    },

We should fix it or document that using nested format is not possible or document how we can use nested fields.

not parsing properly inside logstash 2 beta 2

I was parsing syslog events ; and I got this mapping in version 2 beta 2
?program <134>2015-10-16T14
?referrer <134>2015-10-16T14:39:35.033147+02:00 centos6-
?request <134>2015-10-16T1
?response <13
?syslog_facility local0
?syslog_facility_code 16
?syslog_pri <13
?syslog_severity informational
?syslog_severity_code 6
?timestamp <134>2015-10-16T14:39:35.0
ttype syslog
?verb <13
I rolled back to 2 beta 1 and get far better result :
?response 200
?syslog_facility local0
?syslog_facility_code 16
?syslog_pri 134
?syslog_severity informational
?syslog_severity_code 6
?timestamp 16/Oct/2015:15:26:16 +0200
ttype syslog
?verb GET

I hope I post it to the right repo, I seems to my related to grok but could be also to logstash.
I'm running logstash with 16 filter threads.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.