Giter VIP home page Giter VIP logo

logsuck's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

logsuck's Issues

"Show source" button for events

Splunk has a "show source" button on each event which takes you to a fairly raw view of the file the event came from. Since logsuck does not persist the original file contents (I wonder if Splunk does this?) such a view would need to be reassembled from the EventRaws table.

Maybe an initial version of this could be implemented as a generating pipeline step, such as | surrounding eventid=123 and be displayed as normal. A custom GUI that looks more like a raw text file could then be added later.

E-mail alerts

  • Add a page for CRUDing alerts
  • An alert consists of a name, a description, a search query, a duration, and the "send configuration"
  • In the case of e-mail, the "send configuration" is just a text field to set the recipient address
  • The "send configuration" should be a bit flexible - initially only e-mail will supported, but this should be a part of Logsuck that is pluggable in the future so you can add Slack or Zapier integrations or whatever.
  • Other configuration (mail server, sender address, whatever) is done in logsuck.json

Add a pipeline step for ad hoc field extraction ("| rex")

The rex command is IMO one of the most useful commands in Splunk, so I would like to have something similar to it in logsuck.

Using the rex command could look something like this:

userid | rex "userid was (?P<userid>\d+)."

If the raw value of an event then looks like this:

2021-01-20 19:37:00 The user did something. The userid was 123.

a field named userid should be extracted with the value being 123.

Differences from Splunk's rex

  • The regex should follow logsuck field extractor rules, meaning that the regex must either specify a single named capture group or exactly two unnamed capture groups.
  • Named capture groups start with ?P instead of just ?
  • logsuck rex will not initially support the mode=sed, max_match or offset_field options.

Example JSON config file

First off I absolutely love what you're doing here. I'm just some guy and not a data scientist and trying to get splunk up and running has been an exercise in frustration. I stumbled across logsuck when looking for alternatives and thank. god. for you.

It would be incredibly helpful to have a sample of a completed json that I can use to configure the program, however. Manually editing the json I'm not nsure what needs to stay and what can be removed, and tinkering with the indenting is tedious and confusing. It's great that the schema is there, but getting from schema -> working json is proving to be an absolute bitch for me, unless there's some trick I'm missing (likely since I've only been programming for ~1 year).

Thanks for everything you're doing!!!!

Add a pipeline step for filtering by field value ("| where")

When using | rex, it's currently impossible to filter by the extracted values. I will add a | where command to support this use case.

Using the same example as in #13, it could look something like this:

userid | rex "userid was (?P<userid>\d+)\." | where userid="123"

This should match

2021-01-20 19:37:00 The user did something. The userid was 123.

But not

2021-01-20 19:37:00 The user did something. The userid was 456.

Differences from Splunk's | where

  • Initially, logsuck where will only support this syntax: <field>=<quoted string>, which performs a string comparison between the field value and the given string. More advanced syntax can be added later.

Cannot perform searches that only contain "NOT"s

For example you cannot search for source!=*access*.

This seems to be a limitation in SQLite FTS, when running this query:

SELECT
	e.id,
	e.host,
	e.source,
	e.timestamp,
	r.raw
FROM
	Events e
INNER JOIN EventRaws r ON
	r.rowid = e.id
WHERE
	e.id < 748246
	AND e.timestamp >= '2021-01-09 18:10:59.6623322 +0100 CET m=-886.364488199'
	AND EventRaws MATCH 'NOT source:*access* '
ORDER BY
	e.timestamp DESC,
	e.id DESC;

directly against the database I get the following error: malformed MATCH expression: [NOT source:*access* ]

I'm sure there's a way to work around it.

Non-relative time spans

Currently the Logsuck GUI only supports relative time spans ("Last 15 minutes", "Last 7 days", etc.). It should also be possible to specify absolute time spans like "2021-01-25 20:07 to 2021-01-25 23:01".

  • Add from and to fields to the time picker
  • It should be possible to leave either field blank, in which case the search will be treated as "before the to date" or "after the from date"
  • A "today" option which fills the from date with today's date at 00:00:00 and clears the to date might be nice

Fragments must be lower case when searching

For example if a log contains userId and you search for userId you wont get a match, but if you search for userid you will. This is due to the filtering done in Filter.go in addition to the filtering being done in SQLite, which is a carry-over from before Logsuck used SQLite.

It might be possible to remove this filtering entirely which should also speed up the search since events wont need to be regex-matched against each fragment.

Benchmark read performance

Logsuck needs to be reasonably fast at reading logs and putting them into the database, and I have a few ideas on how to improve this part of Logsuck, but since there is currently no way to objectively measure the performance the first step is to add a way to benchmark this.

The idea is to create a program that does the following:

  • Start logsuck and wait for it to be "stable" - I guess the "Starting Web GUI" log is a pretty good indicator of this
  • Concurrently start logdunk with 0 sleepTime, meaning it dunks HARD
  • Let the programs run for x seconds and then kill them both
  • Log the following: How long the programs ran for, how many lines are in the generated log, how many events were processed per second
  • Clean up the database file and generated log file

This should be expanded on with more advanced cases like multiple log files and forwarder/recipient mode later.

Delete old events / job results

To avoid the database file growing too big and to avoid having years old PII lying around in the database file, it should be possible to configure Logsuck to delete old data.

  • It should be possible to configure separate retention durations for events and for job results
  • Job results should probably have a pretty low duration by default since they aren't very reusable
  • Retention should be configured using Go duration strings, i.e. "7d" = delete events older than 7 days
  • The age of events will be based on the extracted _time field for now, which means that you can't retroactively load old logs into Logsuck without changing the dates in them or increasing the retention setting. Maybe a write_time column should be added to Events and be used instead to help this use case.
  • The age of jobs will be based on the end time of the job (new columns need to be added to the Jobs table for this)
  • There will not be support for infinite retention

Search logs an error if there are no events in the database

When a search is started, SELECT MAX(id) FROM Events is executed and all queries thereafter are filtered to be less than this ID. This was added to avoid weird behavior when events were being added while a search was executing.

This query does not work when Events is empty, you get the following error message in that case: error when scanning max(id) in FilterStream: sql: Scan error on column index 0, name "MAX(id)": converting NULL to int is unsupported.

There are two ways I can imagine fixing this, the first one being to use a SELECT COUNT(1) FROM Events before issuing the MAX query, and the second being to do a string match on the error, similar to the expectedConstraintViolationForDuplicates check. I think the second option is preferable since the first one incurs a performance hit on all searches for an extreme edge case.

Add a dedicated search page instead of using home.tsx

home.tsx currently serves two purposes: it contains a list of recent searches but also contains the actual search functionality. When you perform a search it is handled on the client side and the URL is not updated, which means that when you refresh the page your search results are gone. You also cannot send links to searches.

home.tsx should be split into two page components, Home and Search.

Home should work as it does currently, but instead of just changing state when clicking the search button or a recent search, it should set location.href to /search with a couple of query parameters.

/search should handle the following query parameters initially:

  • query - URL encoded version of the search string
  • timespan - the time span
  • jobId - the job ID of the search

If jobId is set when the Search component mounts, it will make the necessary backend calls to reuse the results of that job. If jobId is not set and query + timespan are set, a new job will be started and the URL will be updated with the new jobId. If the user triggers a new search all three query parameters should be updated.

Support structured logging in JSON

At some point, Logsuck should support structured logging in the form of JSON.

Logsuck should handle both field extraction from JSON logs as well as have a frontend that shows the structure of the JSON object in a better way than just dumping it as a string.

A couple of things that need doing:

  • Extend logdunk so it can generate structured logs. Probably use logrus for this
  • Extend field extractors so that they can be arbitrary functions instead of just regexes, and add a field extractor which parses JSON
  • Update event delimiters to handle multi-line JSON (? - maybe this is optional)
  • Somehow detect that a log event is a JSON string on the frontend and display a tree view instead of a plain string

Speed up search by not using OFFSET for paginating when filtering results

When searching for events, they are retrieved in batches of 1000 using OFFSET/LIMIT. The reason why we can't retrieve all results in one query is because SQLite locks itself while the query is running, so we can't write the results to the JobResults table while we have the search rowset open.

The current implementation becomes slower and slower the more results we find because OFFSET gets more expensive the bigger the offset is. By using https://stackoverflow.com/questions/14468586/efficient-paging-in-sqlite-with-millions-of-records we should be able to speed up the search significantly. Testing directly against the database with ~1M events, retrieving page 200 went from taking 600ms to 3ms.

Preserve current result page in query params

If you perform a search, go to the second page of results and then refresh the page, currently you will be sent back to the first page of results. You also can't link a specific result page to your friends :(. To fix this, a query parameter named "page" should be added to the URL when the pagination is used to navigate to a different page.

  • Set page query parameter in onPageChanged
  • Read the page query parameter in SearchPageComponent's componentDidMount. page is only relevant if jobId is also present.

supported features

hi,

does log suck support the following features?

  • index all files under a directory and sub-directories
  • listen new file create event and auto index the new file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.