janmg / logstash-input-azure_blob_storage Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 8.0 201 KB

This is a plugin for Logstash to fetch files from Azure Storage Accounts

License: Other

Ruby 80.13% Shell 1.06% Go 18.81%

accesslogs azure-storage-blobs logstash-plugin nsg nsg-flow-logs storage-account

logstash-input-azure_blob_storage's People

Contributors

Stargazers

Watchers

Forkers

pinochioze yann-j ottodeng adhiraj-g nttoshev siberiantiger01 lemyst muraliv21

logstash-input-azure_blob_storage's Issues

Plugin hangs at startup and dies after a while and huge resource consumption

Lately our blob feed has been causing Logtash to have lots of difficulties starting up.

I just sent a PR which I think might fix the issue (?): #12

Exception: Faraday::ConnectionFailed

Plugin: <LogStash::Inputs::AzureBlobStorage container=>"insights-logs-networksecuritygroupflowevent", logtype=>"nsgflowlog", interval=>60, id=>"887c74298e1b2c107836e7dc02c22a18c516c2171c18d14b7e0b4b34158ca261", connection_string=>, prefix=>"resourceId=/", enable_metric=>true, codec=><LogStash::Codecs::JSON id=>"json_e8fc0f24-18ab-4047-8a59-75c0f06e9629", enable_metric=>true, charset=>"UTF-8">, dns_suffix=>"core.windows.net", registry_path=>"data/registry.dat", registry_create_policy=>"resume", file_head=>"{"records":[", file_tail=>"]}">
Error: Connection reset by peer
Exception: Faraday::ConnectionFailed
Stack: org/jruby/ext/openssl/SSLSocket.java:893:in sysread_nonblock' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/jopenssl23/openssl/buffering.rb:182:in read_nonblock'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/protocol.rb:175:in rbuf_fill' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/protocol.rb:157:in readuntil'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/protocol.rb:167:in readline' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http/response.rb:40:in read_status_line'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http/response.rb:29:in read_new' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:1504:in block in transport_request'
org/jruby/RubyKernel.java:1193:in catch' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:1501:in transport_request'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:1474:in request' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:1467:in block in request'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:914:in start' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:1465:in request'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:1223:in get' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday-0.17.0/lib/faraday/adapter/net_http.rb:85:in perform_request'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday-0.17.0/lib/faraday/adapter/net_http.rb:43:in block in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday-0.17.0/lib/faraday/adapter/net_http.rb:92:in with_net_http_connection'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday-0.17.0/lib/faraday/adapter/net_http.rb:38:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday_middleware-0.13.1/lib/faraday_middleware/response/follow_redirects.rb:87:in perform_with_redirection'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday_middleware-0.13.1/lib/faraday_middleware/response/follow_redirects.rb:75:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday-0.17.0/lib/faraday/rack_builder.rb:143:in build_response'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday-0.17.0/lib/faraday/connection.rb:387:in run_request' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/http_response_helper.rb:27:in set_up_response'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/http_request.rb:149:in call' org/jruby/RubyMethod.java:116:in call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/signer_filter.rb:28:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/http_request.rb:110:in block in with_filter'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/service.rb:36:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/filtered_service.rb:34:in call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/signed_service.rb:41:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-common-1.1.0/lib/azure/storage/common/service/storage_service.rb:60:in call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-blob-1.1.0/lib/azure/storage/blob/blob_service.rb:179:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-blob-1.1.0/lib/azure/storage/blob/blob.rb:106:in get_blob'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.1/lib/logstash/inputs/azure_blob_storage.rb:260:in partial_read_json' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.1/lib/logstash/inputs/azure_blob_storage.rb:199:in block in run'
org/jruby/RubyHash.java:1417:in each' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.1/lib/logstash/inputs/azure_blob_storage.rb:191:in run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:321:in inputworker' /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:313:in block in start_input'

Log processing stopped while reading corrupted blob

Hi,
we collecting logs in .gz format, thats why we using gzip_lines codec plugin.
this our input config:

    azure_blob_storage {
        storageaccount => "myacc"
        access_key => "password"
        container => "logs"
        interval => 30
        file_head => '{"@t"'
        file_tail => '"}'
        registry_path => "logstash/registry.dat"
        codec => gzip_lines { charset => "ASCII-8BIT"}
    }

Log processing stopt and the following error occured then Logstash try to process corrupted blob:

[2020-03-12T03:30:32,730][ERROR][logstash.javapipeline    ][main] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::AzureBlobStorage container=>"logs", codec=><LogStash::Codecs::GzipLines charset=>"ASCII-8BIT", id=>"ea144f5d-e64d-4341-b297-2dcfc7f1cf2d", enable_metric=>true>, file_head=>"{\"@t\"", storageaccount=>"myacc", access_key=><password>, file_tail=>"\"}", interval=>30, id=>"7a23ccd608a25d6b67053c55689b3080829ddf0bb25017crty50b27ebec539fc", registry_path=>"logstash/registry.dat", enable_metric=>true, logtype=>"raw", dns_suffix=>"core.windows.net", registry_create_policy=>"resume", debug_until=>0, path_filters=>["**/*"]>
  Error: Broken pipe - Unexpected end of ZLIB input stream
  Exception: Errno::EPIPE
  Stack: org/jruby/ext/zlib/JZlibRubyGzipReader.java:652:in `each'
org/jruby/ext/zlib/JZlibRubyGzipReader.java:662:in `each_line'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-gzip_lines-3.0.4/lib/logstash/codecs/gzip_lines.rb:38:in `decode'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.2/lib/logstash/inputs/azure_blob_storage.rb:227:in `block in run'
org/jruby/RubyHash.java:1428:in `each'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.2/lib/logstash/inputs/azure_blob_storage.rb:201:in `run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:328:in `inputworker'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:320:in `block in start_input'
[2020-03-12T03:33:27,090][ERROR][logstash.javapipeline    ][main] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main

I think in this situation plugin should raising error but just skipping this blod and going to the next one.

Error: BlobArchived (409): This operation is not permitted on an archived blob.

Plugin: <LogStash::Inputs::AzureBlobStorage container=>"insights-logs-networksecuritygroupflowevent", logtype=>"nsgflowlog", interval=>60, id=>"f24d31350d3f11c4bf5b63755eee399615ae21f22556ac6850fd3f4e23676b83", connection_string=>, prefix=>"resourceId=/", enable_metric=>true, codec=><LogStash::Codecs::JSON id=>"json_fa68ed3e-bcd3-49fe-8216-b6b515bf96a9", enable_metric=>true, charset=>"UTF-8">, dns_suffix=>"core.windows.net", registry_path=>"data/registry.dat", registry_create_policy=>"resume", file_head=>"{"records":[", file_tail=>"]}">
Error: BlobArchived (409): This operation is not permitted on an archived blob.
RequestId:947b3fff-201e-006c-60f7-4a780a000000
Time:2021-05-17T08:34:40.5724187Z
Exception: Azure::Core::Http::HTTPError
Stack: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/http_request.rb:153:in call' org/jruby/RubyMethod.java:116:in call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/signer_filter.rb:28:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/http_request.rb:110:in block in with_filter'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/service.rb:36:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/filtered_service.rb:34:in call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/signed_service.rb:41:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-common-1.1.0/lib/azure/storage/common/service/storage_service.rb:60:in call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-blob-1.1.0/lib/azure/storage/blob/blob_service.rb:179:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-blob-1.1.0/lib/azure/storage/blob/blob.rb:106:in get_blob'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.1/lib/logstash/inputs/azure_blob_storage.rb:256:in full_read' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.1/lib/logstash/inputs/azure_blob_storage.rb:196:in block in run'
org/jruby/RubyHash.java:1417:in each' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.1/lib/logstash/inputs/azure_blob_storage.rb:191:in run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:321:in inputworker' /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:313:in block in start_input'

Not able to ship Azure Activity Logs to Graylog / ELK

Hi
I am not able to push Azure Activity Logs to Graylog and Elastic as well.
I am using Logstash 8.7.1 version along with gelf output using docker-compose.
Here is my configuration file of Logstash:

input {
     azure_blob_storage {
          connection_string => "DefaultEndpointsProtocol=https;AccountName=<BLOB_NAME>;AccountKey=<BLOB_ACCOUNT_KEY>;EndpointSuffix=core.usgovcloudapi.net"
         container => "insights-activity-logs"
         registry_create_policy => "start_over"
         codec => "json"
         addall => true
         path_filters => ['**/*.json']
         addfilename => true
         prefix => "resourceId=/"
         # Possible options: `do_not_break`, `with_head_tail`, `without_head_tail`
         interval => 5
     }
 }

filter {
    json {
        source => "message"
    }
    mutate {
        add_field => {"short_message" => "This is short message"}
        add_field => { "host" => "127.0.0.1" }
        #remove_field => [ "message" ]
    }
    date {
        match => ["unixtimestamp", "UNIX"]
    }
}

output {
    gelf {
        host => "127.0.0.1"
        port => 12201
    }
    stdout {
      codec => rubydebug {metadata => true}
    }
}```

Now I am seeing the Messages in `stdout`, however it is not pushing logs to gelfudp output.

I am getting  below error:
`ArgumentError: short_message is missing. Options version, short_message and host must be set.` 
What should be ideal config for Azure Activity Logs?

How to update the version to 0.12.7 from 0.9.6

I used the command: /usr/share/logstash/bin/logstash-plugin install logstash-input-azure_blob_storage to install the plugin and then run update command.

the logstash-input-azure_blob_storage did not change.

But the 0.9.6 version does not support connectionstring field, I have to use the 0.12.7 version.

Cloud anybody tell me how to install or update logstash-input-azure_blob_storage plugin to the 0.12.7 version?

Parse error when reading NSG Flow Logs v2

Hey there! First of all, thanks for creating this plugin! I am trying to use it with the pipeline example of the README.md file for logtype => nsgflowlogs, but I get some parsing errors. Example: parse error on FLOWLOGS [2023/02/28-20:00] offset: 1614089 length: 1650380.

Any idea what could be going wrong and how to fix it?

Reading JSON append blobs from storage account

Hello,

Currently I have a diagnostic setting on my Azure Data Factory that sends pipeline/activity logs to a container in storage account in the following format "container/dir/dir/dir/y=2023/m=05/d=02/h=10/m=00/PT1H.json. For every hour, incoming logs from ADF get appended to one append blob (PT1H.json). So far, the input plugin is working fine when reading historical blobs (not current hour) and is offsetting read blobs in registry file. However, I'm running through an issue when reading the current appended blob. Scenario: its 12:00 and logs are written to storage account > logstash is running and picks up new json file > at 12:05 new logs are appended to the same json file > I get the below error

[INFO ] 2023-05-02 10:48:51.835 [[main]<azure_blob_storage] azureblobstorage - resuming from remote registry data/registry.dat
[ERROR] 2023-05-02 10:48:52.947 [[main]<azure_blob_storage] javapipeline - A plugin had an unrecoverable error. Will restart this plugin.
Pipeline_id:main
Plugin: <LogStash::Inputs::AzureBlobStorage container=>"insights-logs-pipelineruns", codec=><LogStash::Codecs::JSONLines id=>"json_lines_cd27bbac-2203-44c5-9469-8925f1f88948", enable_metric=>true, charset=>"UTF-8", delimiter=>"\n">, interval=>10, id=>"f967be9e17d3af9286ab0875ce6754357103745f10dc7217b8275c3568271f9b", storageaccount=>"saeu2afglogpoc", access_key=>, enable_metric=>true, logtype=>"raw", dns_suffix=>"core.windows.net", registry_path=>"data/registry.dat", registry_create_policy=>"resume", addfilename=>false, addall=>false, debug_until=>0, debug_timer=>false, skip_learning=>false, file_head=>"{"records":[", file_tail=>"]}", path_filters=>["**/*"]>
Error: InvalidBlobType (409): The blob type is invalid for this operation.
RequestId:b063a660-f01e-0013-0ee3-7c48cd000000
Time:2023-05-02T10:48:52.8242660Z
Exception: Azure::Core::Http::HTTPError
Stack: /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/http_request.rb:154:in `call' org/jruby/RubyMethod.java:116:in` call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/signer_filter.rb:28:in `call' /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/http_request.rb:111:in` block in with_filter'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/service.rb:36:in `call' /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/filtered_service.rb:34:in` call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/signed_service.rb:41:in `call' /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/storage/common/service/storage_service.rb:60:in` call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-blob-2.0.3/lib/azure/storage/blob/blob_service.rb:179:in `call' /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-blob-2.0.3/lib/azure/storage/blob/block.rb:276:in` list_blob_blocks'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:413:in `partial_read' /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:271:in` block in run'
org/jruby/RubyHash.java:1519:in `each' /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:246:in` run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:414:in `inputworker' /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:405:in` block in start_input'

I guess what I'm trying to find out in short is if this plugin is able to read from append blobs. Please let me know if you need any further information on this matter. Thanks in advance.

Index -1 out of bounds for length

I'm giving the plugin a try but no idea why I'm getting the below error.

Running Logstash 8.10.4 fresh install

Using bundled JDK: /usr/share/logstash/jdk
Picked up _JAVA_OPTIONS: -Dawt.useSystemAAFontSettings=on -Dswing.aatext=true
logstash 8.10.4

Logstash Config:

input {
    azure_blob_storage {
        codec => "json"
        storageaccount => "nsgflowsiemtest"
        access_key => "base64=="
        container => "insights-logs-networksecuritygroupflowevent"
        logtype => "nsgflowlog"
        prefix => "resourceId=/"
        path_filters => ['**/NSG-SIEM-POC/**/*.json']
        interval => 30
    }
}
filter {
    json {
        source => "message"
    }
}
output {
    stdout{codec => rubydebug}
}

For reference, the full location of the json in the storage account is below:
resourceId=/SUBSCRIPTIONS/7123871293721379/RESOURCEGROUPS/SIEM-POC/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/NSG-SIEM-POC/y=2023/m=10/d=27/h=15/m=00/macAddress=7812738HD/PT1H.json

Getting the following error:

INFO ] 2023-10-27 13:57:38.074 [[main]-pipeline-manager] azureblobstorage - === azure_blob_storage 0.12.9 / main / 791cae / ruby 3.1.0p0 ===
[INFO ] 2023-10-27 13:57:38.074 [[main]-pipeline-manager] azureblobstorage - If this plugin doesn't work, please raise an issue in https://github.com/janmg/logstash-input-azure_blob_storage
[INFO ] 2023-10-27 13:57:38.084 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
[INFO ] 2023-10-27 13:57:38.098 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[ERROR] 2023-10-27 13:57:38.379 [[main]<azure_blob_storage] azureblobstorage - caught: undefined local variable or method `path' for #<LogStash::Inputs::AzureBlobStorage:0x6e1391e9>
[ERROR] 2023-10-27 13:57:38.380 [[main]<azure_blob_storage] azureblobstorage - loading registry failed for attempt 1 of 3
[ERROR] 2023-10-27 13:57:38.451 [[main]<azure_blob_storage] azureblobstorage - caught: undefined local variable or method `path' for #<LogStash::Inputs::AzureBlobStorage:0x6e1391e9>
[ERROR] 2023-10-27 13:57:38.452 [[main]<azure_blob_storage] azureblobstorage - loading registry failed for attempt 2 of 3
[ERROR] 2023-10-27 13:57:38.485 [[main]<azure_blob_storage] azureblobstorage - caught: undefined local variable or method `path' for #<LogStash::Inputs::AzureBlobStorage:0x6e1391e9>
[ERROR] 2023-10-27 13:57:38.485 [[main]<azure_blob_storage] azureblobstorage - loading registry failed for attempt 3 of 3
[INFO ] 2023-10-27 13:57:38.486 [[main]<azure_blob_storage] azureblobstorage - learn_encapsulation, this can be skipped by setting skip_learning => true. Or set both head_file and tail_file
[INFO ] 2023-10-27 13:57:39.295 [[main]<azure_blob_storage] azureblobstorage - learn json header and footer failed because Index -1 out of bounds for length 4299
[INFO ] 2023-10-27 13:57:39.295 [[main]<azure_blob_storage] azureblobstorage - head will be: '' and tail is set to: ''
[ERROR] 2023-10-27 13:57:41.159 [[main]<azure_blob_storage] azureblobstorage - caught: Index -1 out of bounds for length 5464374 while trying to list blobs
[ERROR] 2023-10-27 13:58:10.738 [[main]<azure_blob_storage] azureblobstorage - caught: Index -1 out of bounds for length 5464374 while trying to list blobs
[ERROR] 2023-10-27 13:58:41.230 [[main]<azure_blob_storage] azureblobstorage - caught: Index -1 out of bounds for length 5464374 while trying to list blobs
[ERROR] 2023-10-27 13:59:11.240 [[main]<azure_blob_storage] azureblobstorage - caught: Index -1 out of bounds for length 5464374 while trying to list blobs
[ERROR] 2023-10-27 13:59:40.411 [[main]<azure_blob_storage] azureblobstorage - caught: Index -1 out of bounds for length 5464374 while trying to list blobs
[ERROR] 2023-10-27 14:00:11.547 [[main]<azure_blob_storage] azureblobstorage - caught: Index -1 out of bounds for length 5464374 while trying to list blobs
[ERROR] 2023-10-27 14:00:40.893 [[main]<azure_blob_storage] azureblobstorage - caught: Index -1 out of bounds for length 5464374 while trying to list blobs
[ERROR] 2023-10-27 14:01:13.502 [[main]<azure_blob_storage] azureblobstorage - caught: Index -1 out of bounds for length 5464374 while trying to list blobs
[ERROR] 2023-10-27 14:01:41.194 [[main]<azure_blob_storage] azureblobstorage - caught: Index -1 out of bounds for length 5464374 while trying to list blobs
[ERROR] 2023-10-27 14:02:11.273 [[main]<azure_blob_storage] azureblobstorage - caught: Index -1 out of bounds for length 5464374 while trying to list blobs
[ERROR] 2023-10-27 14:02:43.040 [[main]<azure_blob_storage] azureblobstorage - caught: Index -1 out of bounds for length 5464374 while trying to list blobs
[ERROR] 2023-10-27 14:03:11.053 [[main]<azure_blob_storage] azureblobstorage - caught: Index -1 out of bounds for length 5464374 while trying to list blobs

Plugin error and reload

Hi Jan,

I'm getting an error (and plugin restarting) for one container only. It happens often but not every time. Other containers are working fine with very similar plugin config (in other Logstash pipelines). The container is currently 60GB with 1000 blobs and gets written to continuously. In the error log I'm seeing:

Error: Connection reset by peer
Exception: Faraday::ConnectionFailed
Stack: org/jruby/ext/openssl/SSLSocket.java:874:in sysread_nonblock' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/jruby-openssl-0.10.5-java/lib/jopenssl23/openssl/buffering.rb:182:in read_nonblock'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/protocol.rb:175:in rbuf_fill' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/protocol.rb:125:in read'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http/response.rb:293:in block in read_body_0' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http/response.rb:278:in inflater'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http/response.rb:283:in read_body_0' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http/response.rb:204:in read_body'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:1224:in block in get' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.587/lib/seahorse/client/net_http/patches.rb:38:in block in new_transport_request'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http/response.rb:165:in reading_body' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.587/lib/seahorse/client/net_http/patches.rb:37:in new_transport_request'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:1474:in request' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:1467:in block in request'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:914:in start' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:1465:in request'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/http.rb:1223:in get' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday-0.15.4/lib/faraday/adapter/net_http.rb:85:in perform_request'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday-0.15.4/lib/faraday/adapter/net_http.rb:43:in block in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday-0.15.4/lib/faraday/adapter/net_http.rb:92:in with_net_http_connection'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday-0.15.4/lib/faraday/adapter/net_http.rb:38:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday_middleware-0.14.0/lib/faraday_middleware/response/follow_redirects.rb:87:in perform_with_redirection'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday_middleware-0.14.0/lib/faraday_middleware/response/follow_redirects.rb:75:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday-0.15.4/lib/faraday/rack_builder.rb:143:in build_response'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/faraday-0.15.4/lib/faraday/connection.rb:387:in run_request' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/http_response_helper.rb:27:in set_up_response'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/http_request.rb:149:in call' org/jruby/RubyMethod.java:115:in call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/signer_filter.rb:28:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/http/http_request.rb:110:in block in with_filter'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/service.rb:36:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/filtered_service.rb:34:in call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-core-0.1.15/lib/azure/core/signed_service.rb:41:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-common-1.1.0/lib/azure/storage/common/service/storage_service.rb:60:in call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-blob-1.1.0/lib/azure/storage/blob/blob_service.rb:179:in call' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/azure-storage-blob-1.1.0/lib/azure/storage/blob/blob.rb:106:in get_blob'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.4/lib/logstash/inputs/azure_blob_storage.rb:306:in full_read' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.4/lib/logstash/inputs/azure_blob_storage.rb:238:in block in run'
org/jruby/RubyHash.java:1415:in each' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.11.4/lib/logstash/inputs/azure_blob_storage.rb:233:in run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:378:in inputworker' /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:369:in block in start_input'

Is there a way for me to debug this? I'm using the latest plugin version.

Thanks,
Sera

Logstash with plugin output failed

Set up an Azure VM which is Ubuntu18 with java8 and logstash plugin installed successfully.

set up network watcher to retrieve NSG flow logs into storage account successfully

/etc/logstash/conf.d/logstash.conf:

input {
azureblob {
storage_account_name => "flowloghost"
storage_account_key => "......"
container => "insights-logs-networksecuritygroupflowevent"
}
}

output {
stdout
}

when kickstart logstash with /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf
error message triggered,

how can I modify the logstash.conf file

ERROR The specified blob does not exist

Hello, i have an error when i start logstash.

I have complete correctly the needed parameters with my azure configuration but when i start logstash it does not work cause: The specified blob does not exist.

Can someone help me ? Here's my configuration file:

input {
azure_blob_storage {
storageaccount => "my_storageaccount"
access_key => "my_key"
container => "monitoring-test"
}
}

filter {
mutate {
gsub => [
"message", "\xE9", "e",
"message", "\xEA", "e",
"message", "\xE8", "e"
]
}
grok {
patterns_dir => ["./pattern-maxence"]
match => { "message" => "%{DATE:log_timestamp}.?%{WORD:log_type}.?Serveur .* dans un .* %{ETAT:etat}" }
#match => { "message" => "%{DATE:log_timestamp}.?%{WORD:log_type}.?Initalisation du thread applicatif termin.e (r.sultat = %{INITIALISATION:initialisation})" }
#match => { "message" => "%{DATE:log_timestamp}.?%{WORD:log_type}.?Fin du process %{PROCESS:num_process} .* -M"[0-9]* ." -N%{FONCTION:fonction} -d suite . r.ception signal [0-9] non g.r." }
}
date {
match => [ "log_timestamp", "dd:MM:yyyy-HH:mm:ss:SSS" ]
}
}

output {
if "_grokparsefailure" not in [tags] {
elasticsearch {
hosts => ["localhost:9200"]
index => "index-azure-maxence"
}
}
#stdout { codec => rubydebug }
}

Reading Gzip file on azure blob containing json

Possibility to read gz files stored on azure blob. The gz contains json files, have done the following config and below is the error

input {
    azure_blob_storage {
        storageaccount => "ashwin"
        access_key => "12WB3f+exT2wImZgX+N7KgJw=="
        container => "india"
        codec => "json"
    }
}

output {
      elasticsearch {
        user => "elastic"
        password => "F##@AbwOzN"
        ssl => true
        ssl_certificate_verification => false
        hosts => [ "https://127.0.0.1:9200/" ]
        index => "assam-blob-%{+YYYY.MM.dd}"
        cacert => "/etc/logstash/http_ca_1.crt"
      }
}

did tried with different codex - gzip_lines, didnt worked.

[2022-04-21T20:10:28,589][INFO ][logstash.inputs.azureblobstorage][main][2b92afafbd9b3b3a837d391ec4215c55812dc93f150871c0c89b7bcf205559ed] learn json one of the attempts failed BlobArchived (409): This operation is not permitted on an archived blob.

apply to azure vnet flow logs

Hi there,

Can it be used on azure vnet flow logs, too?

Add a filename field in the message

Hello !

I have a question for you : I need to filter my data by the name of my different files, how can I do this ?
I see in the TODO list :

show file path in logger

add filepath as part of log message

So I don't think there is a option to solve my problem but maybe I can succeed by an other way, I have already try to grok the path_filters like:
grok {
match => [ "path_filters", "%{GREEDYDATA:filename}"]
}
But this is not conclusive

Thx

403 when using this plugin with latest ELK stack

Hey there,

I am having trouble getting this plugin to work. I've double checked my access key and account names and they're working fine elsewhere. I've tried enabling the debug logging for the plugin but didn't get anything extra out of it. I've also tried using a different container within the account but I get the same error.

I'm using the latest version of the ELK docker repo from here: https://elk-docker.readthedocs.io/

Here are the steps to reproduce:

sudo docker run -p 5601:5601 -p 9200:9200  -p 5044:5044     -v elk-data:/var/lib/elasticsearch --name elk sebp/elk -d
docker exec -it elk /bin/bash  #into docker
cd $LOGSTASH_HOME
gosu logstash bin/logstash-plugin install logstash-input-azure_blob_storage
vim 01-azure-block-input.conf

input {
    azure_blob_storage {
        storageaccount => "xxx"
        access_key => "xxxx"
        container => "game-logs"
    }
}

ctrl-d #back to host machine
docker restart elk

The container comes back up, starts the pipeline with the new input and then the following errors are present in the logs:

  Pipeline_id:main
  Plugin: <LogStash::Inputs::AzureBlobStorage container=>"game-logs", id=>"742b704f496c4aa285833a1faebee4eaaa92e2be70dd78e283a05b52aa3944a6", storageaccount=>"xxxx", access_key=><password>, enable_metric=>true, codec=><LogStash::Codecs::JSON id=>"json_af7a915e-d6e7-48fe-9b23-e3b0b227f448", enable_metric=>true, charset=>"UTF-8">, logtype=>"raw", dns_suffix=>"core.windows.net", registry_path=>"data/registry.dat", registry_create_policy=>"resume", interval=>60, addfilename=>false, debug_until=>0, debug_timer=>false, skip_learning=>false, file_head=>"{\"records\":[", file_tail=>"]}", path_filters=>["**/*"]>
  Error: undefined method `each' for nil:NilClass
  Exception: NoMethodError
  Stack: /opt/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.12.4/lib/logstash/inputs/azure_blob_storage.rb:206:in `run'
/opt/logstash/logstash-core/lib/logstash/java_pipeline.rb:410:in `inputworker'
/opt/logstash/logstash-core/lib/logstash/java_pipeline.rb:401:in `block in start_input'
[2022-10-19T16:14:07,107][ERROR][logstash.inputs.azureblobstorage][main][742b704f496c4aa285833a1faebee4eaaa92e2be70dd78e283a05b52aa3944a6] caught: AuthenticationFailed (403): Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:b55c0204-f01e-0003-5155-e4ab42000000
Time:2022-10-20T07:24:44.3636660Z
[2022-10-19T16:14:07,107][ERROR][logstash.inputs.azureblobstorage][main][742b704f496c4aa285833a1faebee4eaaa92e2be70dd78e283a05b52aa3944a6] loading registry failed for attempt 1 of 3
[2022-10-19T16:14:07,361][ERROR][logstash.inputs.azureblobstorage][main][742b704f496c4aa285833a1faebee4eaaa92e2be70dd78e283a05b52aa3944a6] caught: AuthenticationFailed (403): Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:b55c02d2-f01e-0003-0155-e4ab42000000
Time:2022-10-20T07:24:44.6185183Z
[2022-10-19T16:14:07,361][ERROR][logstash.inputs.azureblobstorage][main][742b704f496c4aa285833a1faebee4eaaa92e2be70dd78e283a05b52aa3944a6] loading registry failed for attempt 2 of 3
[2022-10-19T16:14:07,616][ERROR][logstash.inputs.azureblobstorage][main][742b704f496c4aa285833a1faebee4eaaa92e2be70dd78e283a05b52aa3944a6] caught: AuthenticationFailed (403): Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:b55c03a3-f01e-0003-4355-e4ab42000000
Time:2022-10-20T07:24:44.8723721Z
[2022-10-19T16:14:07,616][ERROR][logstash.inputs.azureblobstorage][main][742b704f496c4aa285833a1faebee4eaaa92e2be70dd78e283a05b52aa3944a6] loading registry failed for attempt 3 of 3
[2022-10-19T16:14:07,616][INFO ][logstash.inputs.azureblobstorage][main][742b704f496c4aa285833a1faebee4eaaa92e2be70dd78e283a05b52aa3944a6] learn_encapsulation, this can be skipped by setting skip_learning => true. Or set both head_file and tail_file
[2022-10-19T16:14:07,872][INFO ][logstash.inputs.azureblobstorage][main][742b704f496c4aa285833a1faebee4eaaa92e2be70dd78e283a05b52aa3944a6] learn json header and footer failed because AuthenticationFailed (403): Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:b55c043b-f01e-0003-4a55-e4ab42000000
Time:2022-10-20T07:24:45.1282245Z
[2022-10-19T16:14:07,872][INFO ][logstash.inputs.azureblobstorage][main][742b704f496c4aa285833a1faebee4eaaa92e2be70dd78e283a05b52aa3944a6] head will be: {"records":[ and tail is set to ]}
[2022-10-19T16:14:08,126][ERROR][logstash.inputs.azureblobstorage][main][742b704f496c4aa285833a1faebee4eaaa92e2be70dd78e283a05b52aa3944a6] caught: AuthenticationFailed (403): Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:b55c050d-f01e-0003-0d55-e4ab42000000
Time:2022-10-20T07:24:45.3840773Z for list_blobs retries left 3
[2022-10-19T16:14:08,378][ERROR][logstash.inputs.azureblobstorage][main][742b704f496c4aa285833a1faebee4eaaa92e2be70dd78e283a05b52aa3944a6] caught: AuthenticationFailed (403): Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:b55c05d6-f01e-0003-4955-e4ab42000000
Time:2022-10-20T07:24:45.6369310Z for list_blobs retries left 2
[2022-10-19T16:14:08,632][ERROR][logstash.inputs.azureblobstorage][main][742b704f496c4aa285833a1faebee4eaaa92e2be70dd78e283a05b52aa3944a6] caught: AuthenticationFailed (403): Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:b55c069b-f01e-0003-0355-e4ab42000000
Time:2022-10-20T07:24:45.8887867Z for list_blobs retries left 1```

Can multiple logstash instances with plugin logstash-input-azure_blob_storage installed to read the same container read without duplicate processing?

I have a requirement to have multiple logstash instances reading from the same Azure storage account and the same container. The container has activity logs. I am running two logstash instances, and when I check the output of both instances, I find the same activity logs in both instances. I don’t want duplicate logs. Does this plugin avoid duplicate processing? or Do we need any specific config to be set to achieve the same?

Write file and log is not align

The logstash logs show:
[2020-05-15T04:46:47,790][INFO ][logstash.inputs.azureblobstorage][nsg] nsg partial file resourceId=/SUBSCRIPTIONS/XXX.../BASTION-SG/y=2020/m=05/d=15/h=04/m=00/macAddress=XXXXXXXEF06/PT1H.json from 75184 to 76751 [2020-05-15T04:46:47,846][INFO ][logstash.inputs.azureblobstorage][nsg] nsg processed 1711 events, saving 714 blobs and offsets to registry data/registry.dat [2020-05-15T04:47:33,197][INFO ][logstash.inputs.azureblobstorage][nsg] nsg processed 2564 events, saving 714 blobs and offsets to registry data/registry.dat [2020-05-15T04:48:34,424][INFO ][logstash.inputs.azureblobstorage][nsg] nsg partial file resourceId=/SUBSCRIPTIONS/XXX.../BASTION-SG/y=2020/m=05/d=15/h=04/m=00/macAddress=XXXXXXXEF06/PT1H.json from 78487 to 80052 [2020-05-15T04:48:34,428][INFO ][logstash.inputs.azureblobstorage][nsg] nsg processed 2580 events, saving 714 blobs and offsets to registry data/registry.dat [2020-05-15T04:49:35,857][INFO ][logstash.inputs.azureblobstorage][nsg] nsg processed 3469 events, saving 714 blobs and offsets to registry data/registry.dat

But when I see the local file in logstash, it just write the 20200514's data....
there are noting in 2020/05/15

Below is my pipeline:
input { azure_blob_storage { storageaccount => "xxxxxxxxxx" access_key => "....." container => "insights-logs-networksecuritygroupflowevent" prefix => "resourceId=/" logtype => "nsgflowlog" codec => "json" interval => 30 } } filter { json { source => "message" } if [rule] == "UserRule_allow_my_lb" or [dst_ip] in ["10.113.27.176", "10.18.112.36"] { drop { } } mutate { add_field => { "environment" => "test-env" } } date { match => ["unixtimestamp", "UNIX"] target => "@timestamp" timezone => "UTC" } if ![src_bytes] { mutate { coerce => { "[src_bytes]" => "0" } } } if ![dst_bytes] { mutate { coerce => { "[dst_bytes]" => "0" } } } if ![src_pack] { mutate { coerce => { "[src_pack]" => "0" } } } if ![dst_pack] { mutate { coerce => { "[dst_pack]" => "0" } } } } output { file { path => "/logarchive/networksecuritygroup/nsg_%{+YYYYMMdd}.log" codec => line { format => "%{@timestamp}|%{environment}|%{subscription}|%{resourcegroup}|%{nsg}|%{rule}|%{src_ip}|%{dst_ip}|%{src_port}|%{dst_port}|%{protocol}|%{direction}|%{decision}|%{flowstate}|%{src_pack}|%{src_bytes}|%{dst_pack}|%{dst_bytes}" } } }

VNet Flowlog with Plugin

We are trying to use your plugin to visualize VNet Flowlog on Kibana via Logstash. However, we are facing a lack of input from the storage account. Not sure if this is due to the plugin still being developed or if we are having issues due to an incorrect Logstash configuration. Please assist. Thank you.

logstash.conf.json

Azure container with access policy

Hi,
I tried to read files from my azure container but I get an error:

"[2021-02-16T07:59:21,432][INFO ][logstash.inputs.azureblobstorage][main][8fc49f3655f2c70ab7e37a2d0379d5e54d919ee1c50b0c7dc64721f8b31a3049] resuming from remote registry data/registry.dat
[2021-02-16T07:59:24,069][ERROR][logstash.inputs.azureblobstorage][main][29a4c675ca27f6ed56d4ce2e0aeec3534fcfad468f8f5d7befb17d452685398a] caught: BlobNotFound (404): The specified blob does not exist.
RequestId:c40fcf4a-701e-000b-4739-04f6ec000000
"
when I put another container name everything work fine.
the different between them is that in the container that I got the error I have access policy. so I guess that this is the problem.

How should I access to container with access policy? (the container type is blob)
thanks.

Support for any file type

I'm using the Azure-blob-storage input plugin to process files other than JSON and line base log-files. Actually most of the Files are XML (also some binary files like TIFF or PDF). I'm using XML Filter to parse my data and I'm identifying other file type (and process only the filename and type). While my Filter and output pipeline works well the input plugin logs an error for every input file like

[2021-06-17T13:33:31,890][ERROR][logstash.codecs.json ][main][19f37e5f946e8210f25a29fbea722302abeea634fd55cb2369731ed301ea1741] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: (String)" ....

Can we add functionality to allow any file format (or simply a method to surpress the error)

Thanks

Claus

Resume policy doesn't resume

Hi,
With the following configuration set:
input {
azure_blob_storage {
connection_string => ""
container => "insights-logs-networksecuritygroupflowevent"
#registry_create_policy => "resume"
logtype => "nsgflowlog"
prefix => "resourceId=/"
interval => 60
}
}
I have noticed that the plugin goes into an endless loop -> After it reaches the end of registry.dat, it lists again all the blobs in the storage accounts and starts writing again the older events that have been previously already processed. -> this affects the way the events are reaching, after a while not being able to see the ones for the current hour, for example.
Somehow, the resume policy doesn't resume, but goes back to the beginning.

Issue reading timestamp

Hi - I have been using your latest plugin version, which has been working very well - up until about a week ago, when the timestamp was no longer being extracted correctly. Here is an example of a log which was working fine until a week ago. The timestamp ('eventTime') is suddenly being output as 'random' months in year 2018:

{"request":[{"id":"|19e7d278-483a3c4282fab47b.","name":"GET Heartbeat/Get","count":1,"responseCode":200,"success":true,"url":"https://api.sanitised.com/api/heartbeat","urlData":{"base":"/api/heartbeat","host":"api.sanitised.com","hashTag":"","protocol":"https"},"durationMetric":{"value":3030.0,"count":1.0,"min":3030.0,"max":3030.0,"stdDev":0.0,"sampledValue":3030.0}}],"internal":{"data":{"id":"4ebe7c0f-cc3d-11fa-b4d5-71de291dc9ce","documentVersion":"1.61"}},"context":{"application":{"version":"1.0.0.0"},"data":{"eventTime":"2020-07-20T03:45:03.8916679Z","isSynthetic":false,"samplingRate":100.0},"cloud":{},"device":{"type":"PC","roleName":"api","roleInstance":"RE00145D02761B","screenResolution":{}},"user":{"anonId":"anonym","authId":"anonym","isAuthenticated":true},"session":{"isFirst":false},"operation":{"id":"18e4f228-483c4e4228feb54a","parentId":"18e4f228-483c4e4228feb54a","name":"GET Heartbeat/Get"},"location":{"clientip":"0.0.0.0","continent":"Europe","country":"United Kingdom"},"custom":{"dimensions":[{"httpMethod":"GET"},{"AspNetCoreEnvironment":"Production"}]}}}

My logstash input is like this:

input {
    azure_blob_storage {
        storageaccount => "somestorageaccount"
        access_key => "someaccesskey"
        container => "somecontainer"
        prefix => "live-serve-customer_dfad752da2e543c7bdf0b7474ddc7a34/Requests/"
        codec => "json_lines"
        registry_create_policy => "resume"
        interval => 3600
        debug_timer => true
        registry_local_path => '/usr/share/logstash/plugins'
        type => "cust-requests"
    }
}

Do you have any idea why the date might have stopped being processed correctly?

Thanks,
Sera

0.10.2 Broken

Code is looking for "use_redis" and no way to pass it in from input config.

registry.dat reset

Hi,

The registry.dat appended to this issue shows that after 19 October it started writing data from 19 September.
--> The logstash service has been restarted on 19 October, before this issue occurred.

The registry_create_policy is set to default.

I also noticed this message in logs:

[2021-10-19T07:34:31,463][INFO ][logstash.inputs.azureblobstorage][nsg-cut][bd5b8a18d8fac3940390f2673c9391a24fcead2e3d0ea73748837d175ffcc670] Skipped writing the registry because previous write still in progress, it just takes long or may be hanging!
[2021-10-19T07:34:52,166][INFO ][logstash.inputs.azureblobstorage][nsg-cut][bd5b8a18d8fac3940390f2673c9391a24fcead2e3d0ea73748837d175ffcc670] Skipped writing the registry because previous write still in progress, it just takes long or may be hanging!

Thank you,

Generate more logs than source file on storage accout for NSG log

I down load the "PT1H.json" from storage account on Azure Portal GUI.

And compare with local file on logstash host.
And found the local file's log count is more than source file.

in "PT1H.json" , only just 1 log(I just test a ssh login event).
But in local file ,there are 3 line logs.

Below is my pipeline:

undefined method `each' for nil:NilClass

Hello,

I'm using the following config to ingest the flowlogs of Azure NSG (version 1):

azure_blob_storage
{
storageaccount => "xxxxxxxxflowlogs"
access_key => "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
container => "insights-logs-networksecuritygroupflowevent"
registry_create_policy => "resume"
logtype => "nsgflowlog"
codec => "json"
tags => "nsg"
id => "nsg_5"
}

But I getting the following error and crash:
[2021-12-20T17:38:49,688][ERROR][logstash.javapipeline ][main][nsg_3] A plugin had an unrecoverable error. Will restart this plugin.
Pipeline_id:main
Plugin: <LogStash::Inputs::AzureBlobStorage container=>"insights-logs-networksecuritygroupflowevent", codec=><LogStash::Codecs::JSON id=>"json_e3ee24a1-4cef-4acd-88ad-fxxxxxxxxxxxxxxxx", enable_metric=>true, charset=>"UTF-8">, logtype=>"nsgflowlog", storageaccount=>"xxxxxxflowlogs", access_key=>, registry_create_policy=>"resume", id=>"nsg_3", tags=>["nsg"], enable_metric=>true, dns_suffix=>"core.windows.net", registry_path=>"data/registry.dat", interval=>60, addfilename=>false, debug_until=>0, debug_timer=>false, skip_learning=>false, file_head=>"{"records":[", file_tail=>"]}", path_filters=>["**/*"]>

Error: undefined method `each' for nil:NilClass

Exception: NoMethodError

Stack: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.12.0/lib/logstash/inputs/azure_blob_storage.rb:392:in nsgflowlog' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.12.0/lib/logstash/inputs/azure_blob_storage.rb:257:in block in run'
org/jruby/RubyHash.java:1415:in each' /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-azure_blob_storage-0.12.0/lib/logstash/inputs/azure_blob_storage.rb:226:in run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:409:in inputworker' /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:400:in block in start_input'

How can I figure out which record is breaking the plugin ? Is there a way to ignore the bad record and move on ?

Thanks for your great plugin!

failure fired in logstash-input-azure_blob_storage

There are errors fired when I tried to use logstash-input-azure_blob_storage, can you help to check what is the root cause of the error?

e5114ca4f761df81d8b9b82a00be457_0.log from 5873 to 150695
[ERROR] 2023-05-04 05:20:12.860 [[logstash_3]<azure_blob_storage] javapipeline - A plugin had an unrecoverable error. Will restart this plugin.
Pipeline_id:logstash_3
Plugin: <LogStash::Inputs::AzureBlobStorage container=>"###-####-aks-####", codec=><LogStash::Codecs::Line id=>"line_47cba90b-c584-4aa7-9835-####", enable_metric=>true, charset=>"UTF-8", delimiter=>"\n">, logtype=>"raw", prefix=>"logs/####", storageaccount=>"####", access_key=>, interval=>300, id=>"####", debug_until=>1000, addfilename=>true, registry_path=>"elk_registry_data/registry_container_logs.dat", enable_metric=>true, dns_suffix=>"core.windows.net", registry_create_policy=>"resume", addall=>false, debug_timer=>false, skip_learning=>false, file_head=>"{"records":[", file_tail=>"]}", path_filters=>["**/*"]>
Error: InvalidBlobType (409): The blob type is invalid for this operation.
RequestId:88bf8c47-f01e-000c-0648-####
Time:2023-05-04T05:20:12.8424899Z
Exception: Azure::Core::Http::HTTPError
Stack: /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/http_request.rb:154:in call' org/jruby/RubyMethod.java:116:in call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/signer_filter.rb:28:in call' /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/http_request.rb:111:in block in call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/service.rb:36:in call' /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/filtered_service.rb:34:in call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/signed_service.rb:41:in call' /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/storage/common/service/storage_service.rb:60:in call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-blob-2.0.3/lib/azure/storage/blob/blob_service.rb:179:in call' /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-blob-2.0.3/lib/azure/storage/blob/block.rb:276:in list_blob_blocks'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:413:in partial_read' /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:271:in block in run'
org/jruby/RubyHash.java:1519:in each' /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:246:in run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:414:in inputworker' /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:405:in block in start_input'
[INFO ] 2023-05-04 05:20:14.062 [[logstash_3]<azure_blob_storage] azureblobstorage - resuming from remote registry elk_registry_data/registry_container_logs.dat

Parse Error

Getting below error

[ERROR] 2023-05-25 04:22:59.725 [[main]<azure_blob_storage] json - JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Illegal unquoted character ((CTRL-CHAR, code 27)): has to be escaped using backslash to be included in string value

The error seems to be ESC CHAR in the original input log file.Would it be possible for this plugin to have a parameter to ignore this error and forward the logs to filter plugin

Logstash plugin installation Faraday version mismatch error

Hi There,

While trying to install the logstash plugin we faced the following error.
`Using bundled JDK: /usr/share/logstash/jdk
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Validating logstash-input-azure_blob_storage
Installing logstash-input-azure_blob_storage
Plugin version conflict, aborting
ERROR: Installation Aborted, message: Bundler could not find compatible versions for gem "faraday":
In snapshot (Gemfile.lock):
faraday (= 1.7.0)

In Gemfile:
logstash-input-azure_blob_storage was resolved to 0.11.3, which depends on
azure-storage-blob (> 1.0) was resolved to 1.1.0, which depends on
azure-core (> 0.1.13) was resolved to 0.1.15, which depends on
faraday (~> 0.9)

logstash-core was resolved to 7.15.0, which depends on
  elasticsearch (~> 7) was resolved to 7.13.3, which depends on
    elasticsearch-transport (= 7.13.3) was resolved to 7.13.3, which depends on
      faraday (~> 1)

Running bundle update will rebuild your snapshot from scratch, using only
the gems in your Gemfile, which may resolve the conflict.

Bundler could not find compatible versions for gem "logstash-input-azure_blob_storage":
In Gemfile:
logstash-input-azure_blob_storage`

How to resolve it. Not able to understand. Please help

Plugin input is not parsing through CSV properly

Hi @janmg ,

I am trying to fetch a CSV file from storage account. How does this plugin parse through a csv file. As the logstash filter is only parsing the header and not the rest of the rows.
Also, the whole csv is being passed in the message field. Can you please help me out?

input {
azure_blob_storage {
storageaccount => "storagename"
access_key => ""
container => "container name"
path_filters => ['**/*.csv']
addfilename => true
#registry_create_policy -=> "start_fresh"
}
}
filter {
csv {
separator => ","
skip_header => "true"
columns => ["head","sales","team"........]
#source => "message"
}
}

Exclude prefix or only get latest files?

I'm readings logs from AKS.
The container contains already logs from previous months and multiple different clusters.
The plugin seems to read the earliest available logs.
However I'd like to start with the latest and ignore all other logs from before x days or exclude a prefix.

For testing purposes I've added now a prefix but I don't want to change this each month. And I also don't want to create multiple inputs based on cluster in my case.
resourceId= / SUBSCRIPTIONS /12345 / RESOURCEGROUPS / RGAKSXYZ / PROVIDERS / MICROSOFT.CONTAINERSERVICE / MANAGEDCLUSTERS / CLUSTER1 / y=2020 / m=05

Is there already a solution to this which I've overlooked?

Facing issue in reading SignIn Logs from Azure Blob Storage

Hi
I am trying to push Azure SignIn Logs to Graylog using Logstash.
I have exported Azure SignIn Logs to Azure Blob Storage.
I have also configured azure_blob_storage input plugin in Logstash.
Logstash is running properly, however it is not sending some of the logs or missing some of the SignIn Logs, though theyu are there in Blob Storage.
When I checked logs, noticed that there are continous errors regarding partial_read.
Error:

[2023-07-07T03:50:07,756][ERROR][logstash.javapipeline    ][main][39346dc9201d4b47a873dce433876729e693e1b862a5cab7a602272273e1c31c] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::AzureBlobStorage container=>"insights-logs-managedidentitysigninlogs", registry_local_path=>"/usr/share/logstash/pipeline/reg_managedidentitysigninlogs", codec=><LogStash::Codecs::JSON id=>"json_f03b1fcf-b19e-4148-8873-c1be979e0c1b", enable_metric=>true, charset=>"UTF-8">, path_filters=>["**/*.json"], prefix=>"tenantId=<TENANT_ID>", registry_create_policy=>"resume", interval=>20, skip_learning=>true, dns_suffix=>"core.usgovcloudapi.net", id=>"<ID>", connection_string=><password>, enable_metric=>true, logtype=>"raw", registry_path=>"data/registry.dat", addfilename=>false, addall=>false, debug_until=>0, debug_timer=>false, file_head=>"{\"records\":[", file_tail=>"]}">
  Error: InvalidBlobType (409): The blob type is invalid for this operation.
RequestId:XXXXXXXXXXXXXXXX
Time:2023-07-07T03:50:07.7508606Z
  Exception: Azure::Core::Http::HTTPError
  Stack: /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/http_request.rb:154:in `call'
org/jruby/RubyMethod.java:116:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/signer_filter.rb:28:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/http/http_request.rb:111:in `block in with_filter'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/service.rb:36:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/filtered_service.rb:34:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/core/signed_service.rb:41:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-common-2.0.4/lib/azure/storage/common/service/storage_service.rb:60:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-blob-2.0.3/lib/azure/storage/blob/blob_service.rb:179:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/azure-storage-blob-2.0.3/lib/azure/storage/blob/block.rb:276:in `list_blob_blocks'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:413:in `partial_read'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:271:in `block in run'
org/jruby/RubyHash.java:1519:in `each'
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-input-azure_blob_storage-0.12.7/lib/logstash/inputs/azure_blob_storage.rb:246:in `run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:414:in `inputworker'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:405:in `block in start_input'

Here is my logstash configuration:

input {
     azure_blob_storage {
         connection_string => "DefaultEndpointsProtocol=https;AccountName=<BLOB_STORAGE_ACCOUNT>;AccountKey=<ACCOUNT_KEY>;EndpointSuffix=core.usgovcloudapi.net"
         dns_suffix => "core.usgovcloudapi.net"
         container => "insights-logs-managedidentitysigninlogs"
         registry_create_policy => "resume"
         registry_local_path => "/usr/share/logstash/pipeline/reg_managedidentitysigninlogs"
         codec => "json"
         interval => 20
         skip_learning => true
         prefix => "<TENANT_ID>"
         path_filters => ['**/*.json']
     }
     azure_blob_storage {
         connection_string => "DefaultEndpointsProtocol=https;AccountName=<BLOB_STORAGE_ACCOUNT>;AccountKey=<ACCOUNT_KEY>;EndpointSuffix=core.usgovcloudapi.net"
         dns_suffix => "core.usgovcloudapi.net"
         container => "insights-logs-noninteractiveusersigninlogs"
         registry_create_policy => "resume"
         registry_local_path => "/usr/share/logstash/pipeline/reg_noninteractiveusersigninlogs"
         codec => "json"
         interval => 20
         skip_learning => true
         prefix =>  "<TENANT_ID>"
         path_filters => ['**/*.json']
     }
     azure_blob_storage {
         connection_string => "DefaultEndpointsProtocol=https;AccountName=<BLOB_STORAGE_ACCOUNT>;AccountKey=<ACCOUNT_KEY>;EndpointSuffix=core.usgovcloudapi.net"
         dns_suffix => "core.usgovcloudapi.net"
         container => "insights-logs-serviceprincipalsigninlogs"
         registry_create_policy => "resume"
         registry_local_path => "/usr/share/logstash/pipeline/reg_serviceprincipalsigninlogs"
         codec => "json"
         interval => 20
         skip_learning => true
         prefix =>  "<TENANT_ID>"
         path_filters => ['**/*.json']
     }
     azure_blob_storage {
         connection_string => "DefaultEndpointsProtocol=https;AccountName=<BLOB_STORAGE_ACCOUNT>;AccountKey=<ACCOUNT_KEY>;EndpointSuffix=core.usgovcloudapi.net"
         dns_suffix => "core.usgovcloudapi.net"
         container => "insights-logs-signinlogs"
         registry_create_policy => "resume"
         registry_local_path => "/usr/share/logstash/pipeline/reg_signinlogs"
         codec => "json"
         interval => 20
         skip_learning => true
         prefix => "<TENANT_ID>"
         path_filters => ['**/*.json']
     }
}

filter {

    mutate {
        add_field => {"short_message" => ["Azure Signin"]}
        add_field => { "host" => "logstash-signin" }
    }
    date {
        match => ["unixtimestamp", "UNIX"]
    }
}

output {
    gelf {
        host => "<GRAYLOG IP>"
        port => 12201
        protocol => "TCP"
    }
}

Plugin not creating registry file

Hi Jan,

Any chance you could suggest a way to troubleshoot this? I have two Azure resource groups - one in a Dev environment and the other in a Production environment. Each one has a Storage Account with a container. The permissions and config 'seems' identical.

Running your plugin against the Dev environment works - blob log files (json lines) are ingested by your plugin and sent up to elasticsearch.

Running the plugin against the Production environment produces no visible errors (in DEBUG mode), however no data is ingested, and no data/registry.dat file is created. I have tried using resume, start_over, and start_fresh.

The input config looks like this:

input {
azure_blob_storage {
storageaccount => "some-storage-account"
access_key => "some-key"
container => "logcontainer"
codec => "line"
registry_create_policy => "start_fresh"
path_filters => ['2020-05-14/11/*']
interval => 60
}
}

The blob files in each environment are the same. Size of each file is between 5kb and 5mb.

Any suggestions for how to troubleshoot / trace in more detail?

Many thanks,
Sera

Unable to configure multiple paths in logstash-input-azure_blob_storage-0.11.1

Dear everyone,
I try to configure multiple paths with "prefix" and treat it like list type as below:
prefix => [ "path/to/first/", "path/to/second/" ]

Although I try to change the "prefix" in the code like below but its still not work,
config :prefix, :validate => :array, :default => [], :required => false

I am just very new in Ruby so I can't do more. Your plug-in is so amazing because I have tried and debug many times with logstash-input-azureblob plug-in with the huges log file in Application Insights
Is there a problem if we use both logstash-input-azure_blob_storage and logstash-input-azure_blob access to a storage account at the same time?
Would you please fix this problems? Thank you very much.

log_type for azure apim diagnostic logs

I need to know if this plugin supports the reading of the diagnostic logs which are forwarded to blob storage.
I get json parse errors.

the apim logs like this I have 2 log lines in this blob, what is the corresponding log type?

{ "DeploymentVersion": "0.30.1742.0", "Level": 4, "isRequestSuccess": true, "time": "2022-09-16T15:22:06.5944852Z", "operationName": "Microsoft.ApiManagement/GatewayLogs", "category": "GatewayLogs", "durationMs": 17, "callerIpAddress": "76.75.136.14", "correlationId": "c355dc72-0dbc-4e0d-b34c-9e02e1f96e9a", "location": "Canada Central", "properties": {"method":"GET","url":"https://apim-oagelk-dev-002.azure-api.net/api/fhir/test","backendResponseCode":200,"responseCode":200,"responseSize":486,"cache":"none","backendTime":16,"requestSize":660,"apiId":"providertemplate","operationId":"get-test","clientProtocol":"HTTP/1.1","backendProtocol":"HTTP/1.1","apiRevision":"1","clientTlsVersion":"1.2","backendMethod":"GET","backendUrl":"https://simulatedbackend.azurewebsites.net/api/HttpTrigger2/test","requestHeaders":{"Authorization":"Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiIiLCJpc3QiOiIiLCJpYXQiOjE2NjMzNDE3MjYsImV4cCI6MTY2MzM0MTc1NiwianRpIjoiand0X25vbmNlIiwidXNlcm5hbWUiOiJ0ZXN0dXNlcjMifQ.FvMLBnwnVP2XKKmLGvoQUtDgaQE-gtcCYYaYq-gJ4DU"},"traceRecords":[{"message":"testuser3","severity":"Information","timestamp":"2022-09-16T15:22:06.5944852Z"},{"message":"{ "test": "jsonformat message"}","severity":"Information","timestamp":"2022-09-16T15:22:06.5944852Z"}]}, "resourceId": "/SUBSCRIPTIONS/ABDE4308-7B2A-43F0-8557-3ACFFB6BEE0B/RESOURCEGROUPS/RG-OAGELK-DEV-001/PROVIDERS/MICROSOFT.APIMANAGEMENT/SERVICE/APIM-OAGELK-DEV-002"}
{ "DeploymentVersion": "0.30.1742.0", "Level": 4, "isRequestSuccess": true, "time": "2022-09-16T15:39:20.9565494Z", "operationName": "Microsoft.ApiManagement/GatewayLogs", "category": "GatewayLogs", "durationMs": 72, "callerIpAddress": "76.75.136.14", "correlationId": "371bbe9c-e663-49a7-904a-30e3fcc93700", "location": "Canada Central", "properties": {"method":"GET","url":"https://apim-oagelk-dev-002.azure-api.net/api/fhir/sign","responseCode":200,"responseSize":93,"cache":"none","apiId":"providertemplate","operationId":"get-sign","clientProtocol":"HTTP/1.1","apiRevision":"1","clientTlsVersion":"1.2","requestHeaders":{"Authorization":"Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiIiLCJpc3QiOiIiLCJpYXQiOjE2NjMzNDI3NjAsImV4cCI6MTY2MzM0Mjc5MCwianRpIjoiand0X25vbmNlIiwidXNlcm5hbWUiOiJ0ZXN0dXNlcjMifQ.3_yEz3toqK4yEAfec0Xa7A2UMYg4g0FOk4usNh_-nFI"}}, "resourceId": "/SUBSCRIPTIONS/ABDE4308-7B2A-43F0-8557-3ACFFB6BEE0B/RESOURCEGROUPS/RG-OAGELK-DEV-001/PROVIDERS/MICROSOFT.APIMANAGEMENT/SERVICE/APIM-OAGELK-DEV-002"}

Logstash not able to read and process .csv file

I am reading .csv file from the Azure storage blob and adding that data in Elasticsearch.
But logstash gets stuck after the below lines.

[2023-06-16T18:30:28,479][INFO ][logstash.inputs.azureblobstorage][main][181f6119611506bdf4dacc3ea01b35a1dee26ed795d3b801c7e8f3d7080a0e8c] learn json one of the attempts failed
[2023-06-16T18:30:28,479][INFO ][logstash.inputs.azureblobstorage][main][181f6119611506bdf4dacc3ea01b35a1dee26ed795d3b801c7e8f3d7080a0e8c] head will be: {"records":[ and tail is set to ]}

`below is the pipeline code:

input
{
	  azure_blob_storage {
        storageaccount => "xxx"
        access_key => "yyy"
        container => "zzz"
		#path_filters => ['**/invoices_2023.03.04_20.csv']
    }
}
filter {
mutate {
            gsub => ["message", "\"\"", ""]
       }

  csv {
      separator => ","
  columns => ["uniqueId","vendorId","vendorNumber","vendorName","reference","documentDate","purchaseOrderNumber","currency","amount","dueDate","documentStatus","invoiceBlocked","paymentReference","clearingDate","paymentMethod","sourceSystem","documentType","companyCode","companyCodeName","companyCodeCountry","purchaseOrderItem","deliveryNoteNumber","fiscalYear","sapInvoiceNumber","invoicePendingWith"]
  }
if [uniqueId] == "Unique Key"
{
drop { }
}
if [uniqueId] =~ /^\s*$/ {
  drop { }
}
   mutate { remove_field => "message" }
   mutate { remove_field => "@version" }
   mutate { remove_field => "host" }
   mutate { add_field => { "docTypeCode" => "1019" }}
  # mutate { remove_field => "@timestamp" }
}
output {
        elasticsearch {
            hosts => "aaa:9243"
            user => "bbb"
            password => "ccc@1234"
            index => "invoicehub_test"
            document_type  => "_doc"
            action => "index"
            document_id => "%{uniqueId}"
        }
       # stdout { codec => rubydebug { metadata => true} }
}

CSV file contains data:
Unique Key,Vendor Id,Vendor Number # VN,Vendor Name,Reference,Document Date,Purchase order number,Currency,Amount,Due Date,Document Status,Invoice blocked,Payment Reference Number,Payment/Clearing Date,Payment Method,Source System,Document Type,Company Code,Company Name,Company code country,Purchase order item,Delivery note number,FiscalYear,SAP Invoice Number,Invoice Pending with (email id) AMP_000013530327,50148526,50148526,CARTUS RELOCATION CANADA LTD,2000073927CA,10/21/2019,,CAD,2041.85,11/21/2019,Pending Payment,,,,,AMP,Invoice,1552,Imperial Oil-DS Br,CA,,,2019,2019, AMP_000013562803,783053,783053,CPS COMUNICACIONES SA,A001800009476,11/1/2019,,ARS,1103.52,12/1/2019,Pending Payment,,,,,AMP,Invoice,2399,ExxonMobil B.S.C Arg. SRL,AR,,,2019,2019, AMP_000013562789,50115024,50115026,FARMERS ALLOY FABRICATING INC,7667,11/5/2019,4410760848,USD,-38940.48,12/5/2019,In Progress,,,,,AMP,Credit Note,944,EM Ref&Mktg (Div),US,,,0,0,[email protected]

Not all events are parsed

Hi,
nice to meet you and congratulations for the plugin, it's very cool!

I'm reaching you out as I've put in place a Logstash pipeline with your plugin to ship logs from SAP Commerce Cloud (Azure Blob Storage) to a custom ElasticSearch.
Here's my pipeline:
`input {
azure_blob_storage {
storageaccount => "<STORAGE_ACCOUNT>"
access_key => "<ACCESS_KEY>"
container => "commerce-logging"
interval => 300
}
}

output {
elasticsearch {
hosts => ["HOST"]
index => "test-index"
user => "ES_USER"
password => "ES_USER"
}
stdout { codec => rubydebug }
}`

From logs I can see that only few logs are processed (and then sent to my custom ES):
Here's what I see from logs:
processed 1 events, saving 452 blobs and offsets to remote registry data/registry.dat

I've put in place a debug mode for this issue and I see below:
...
debug_timer => true
debug_until => 100
...

I'm attaching the logs.

I hope you can help me in this!

Kind Regards,
Luigi

logstash_issue_azure_blob_storage.log

i need create a prefix with the format yyyy/MM/dd

I need to create a prefix that allows me to have the year, month and day as a variable so that my logstash process does not analyze the millions of records I have in my storage but only uses the folder of the current day.

i tried with this but doesnt work

        prefix => "%{YEAR}/%{MONTHNUM}/%{MONTHDAY}/"

Split the long json log files

Hello,

Tried azure_blob_storage input plugin with "json" codec and our requirement is to split the long json file into small units and then to process. Was looking for any options like "break_json_down_policy" in this plugin. Currently we are getting the results as shown below.

Event#1
{

"Computer": "***",
"LogEntrySource": "stderr",
***** o/p *** omitted.
}

{

"Computer": "***",
"LogEntrySource": "stderr",
***** o/p *** omitted,
}

What we are trying to achieve is as shown below

Event#1
{

"Computer": "***",
"LogEntrySource": "stderr",
***** o/p *** omitted.
}

Event #2
{

"Computer": "***",
"LogEntrySource": "stderr",
***** o/p *** omitted,
}

No filename when using line codec (and extra "\r" character at the end of every line)

Thank you for this very nice plug-in. I am trying to use it for CSV files by using the "line" codec but each line gets an extra "\r" character at the end of every line, e.g. see the event.original field below -

Also - is there a way to capture the path and name of the source file within its container as a field in the event?

{
  "@timestamp": [
    "2023-12-04T22:31:16.452Z"
  ],
  "@version": [
    "1"
  ],
  "data_stream.dataset": [
    "generic"
  ],
  "data_stream.namespace": [
    "default"
  ],
  "data_stream.type": [
    "logs"
  ],
  "event.original": [
    "37,1,,0,0,0,0,BILLING,,Email,0,0,0,2022-03-31 03:22:29.990\r"
  ],
  "message": [
    "37,1,,0,0,0,0,BILLING,,Email,0,0,0,2022-03-31 03:22:29.990\r"
  ],
  "_id": "X8j1NowBpjArVxS9BV6_",
  "_index": ".ds-logs-generic-default-2023.12.04-000001",
  "_score": null
}

janmg / logstash-input-azure_blob_storage Goto Github PK

logstash-input-azure_blob_storage's People

Contributors

Stargazers

Watchers

Forkers

logstash-input-azure_blob_storage's Issues

show file path in logger

add filepath as part of log message

Recommend Projects

Recommend Topics

Recommend Org