jackbister / logsuck Goto Github PK
View Code? Open in Web Editor NEWEasy log aggregation, indexing and searching
License: Apache License 2.0
Easy log aggregation, indexing and searching
License: Apache License 2.0
For example, do an empty search in a file generated by logdunk, then search for frobnosticating
. The GUI will look like it found results, but the results will be the same as they were for the userId
search.
The results part of the state object probably needs to be reset when the search button is pressed?
Splunk has a "show source" button on each event which takes you to a fairly raw view of the file the event came from. Since logsuck does not persist the original file contents (I wonder if Splunk does this?) such a view would need to be reassembled from the EventRaws table.
Maybe an initial version of this could be implemented as a generating pipeline step, such as | surrounding eventid=123
and be displayed as normal. A custom GUI that looks more like a raw text file could then be added later.
The rex command is IMO one of the most useful commands in Splunk, so I would like to have something similar to it in logsuck.
Using the rex command could look something like this:
userid | rex "userid was (?P<userid>\d+)."
If the raw value of an event then looks like this:
2021-01-20 19:37:00 The user did something. The userid was 123.
a field named userid should be extracted with the value being 123.
?P
instead of just ?
mode=sed
, max_match
or offset_field
options.First off I absolutely love what you're doing here. I'm just some guy and not a data scientist and trying to get splunk up and running has been an exercise in frustration. I stumbled across logsuck when looking for alternatives and thank. god. for you.
It would be incredibly helpful to have a sample of a completed json that I can use to configure the program, however. Manually editing the json I'm not nsure what needs to stay and what can be removed, and tinkering with the indenting is tedious and confusing. It's great that the schema is there, but getting from schema -> working json is proving to be an absolute bitch for me, unless there's some trick I'm missing (likely since I've only been programming for ~1 year).
Thanks for everything you're doing!!!!
When using | rex
, it's currently impossible to filter by the extracted values. I will add a | where
command to support this use case.
Using the same example as in #13, it could look something like this:
userid | rex "userid was (?P<userid>\d+)\." | where userid="123"
This should match
2021-01-20 19:37:00 The user did something. The userid was 123.
But not
2021-01-20 19:37:00 The user did something. The userid was 456.
<field>=<quoted string>
, which performs a string comparison between the field value and the given string. More advanced syntax can be added later.The GUI seems to be updated when the first "jobStats" poll is completed, which may happen before the search has found any results.
For example you cannot search for source!=*access*
.
This seems to be a limitation in SQLite FTS, when running this query:
SELECT
e.id,
e.host,
e.source,
e.timestamp,
r.raw
FROM
Events e
INNER JOIN EventRaws r ON
r.rowid = e.id
WHERE
e.id < 748246
AND e.timestamp >= '2021-01-09 18:10:59.6623322 +0100 CET m=-886.364488199'
AND EventRaws MATCH 'NOT source:*access* '
ORDER BY
e.timestamp DESC,
e.id DESC;
directly against the database I get the following error: malformed MATCH expression: [NOT source:*access* ]
I'm sure there's a way to work around it.
If you open the popover and then close it, the pagination buttons become unclickable. This is probably because the popover is still there and blocks the clicks from reaching the pagination.
Currently the Logsuck GUI only supports relative time spans ("Last 15 minutes", "Last 7 days", etc.). It should also be possible to specify absolute time spans like "2021-01-25 20:07 to 2021-01-25 23:01".
For example, this configuration should work:
{
"files": [
{
"fileName": "/opt/mylogs/*.txt"
}
]
}
and match all .txtfiles in /opt/mylogs.
For example if a log contains userId
and you search for userId
you wont get a match, but if you search for userid
you will. This is due to the filtering done in Filter.go in addition to the filtering being done in SQLite, which is a carry-over from before Logsuck used SQLite.
It might be possible to remove this filtering entirely which should also speed up the search since events wont need to be regex-matched against each fragment.
The search correctly uses the same time range as the recent search did, but the time picker shows "Last 15 minutes".
Logsuck needs to be reasonably fast at reading logs and putting them into the database, and I have a few ideas on how to improve this part of Logsuck, but since there is currently no way to objectively measure the performance the first step is to add a way to benchmark this.
The idea is to create a program that does the following:
This should be expanded on with more advanced cases like multiple log files and forwarder/recipient mode later.
This happens because the SQL query does WHERE e.id < MAX(id)
instead of WHERE e.id <= MAX(id)
.
To avoid the database file growing too big and to avoid having years old PII lying around in the database file, it should be possible to configure Logsuck to delete old data.
When a search is started, SELECT MAX(id) FROM Events
is executed and all queries thereafter are filtered to be less than this ID. This was added to avoid weird behavior when events were being added while a search was executing.
This query does not work when Events is empty, you get the following error message in that case: error when scanning max(id) in FilterStream: sql: Scan error on column index 0, name "MAX(id)": converting NULL to int is unsupported
.
There are two ways I can imagine fixing this, the first one being to use a SELECT COUNT(1) FROM Events
before issuing the MAX query, and the second being to do a string match on the error, similar to the expectedConstraintViolationForDuplicates
check. I think the second option is preferable since the first one incurs a performance hit on all searches for an extreme edge case.
home.tsx currently serves two purposes: it contains a list of recent searches but also contains the actual search functionality. When you perform a search it is handled on the client side and the URL is not updated, which means that when you refresh the page your search results are gone. You also cannot send links to searches.
home.tsx should be split into two page components, Home and Search.
Home should work as it does currently, but instead of just changing state when clicking the search button or a recent search, it should set location.href
to /search
with a couple of query parameters.
/search
should handle the following query parameters initially:
query
- URL encoded version of the search stringtimespan
- the time spanjobId
- the job ID of the searchIf jobId is set when the Search component mounts, it will make the necessary backend calls to reuse the results of that job. If jobId is not set and query + timespan are set, a new job will be started and the URL will be updated with the new jobId. If the user triggers a new search all three query parameters should be updated.
At some point, Logsuck should support structured logging in the form of JSON.
Logsuck should handle both field extraction from JSON logs as well as have a frontend that shows the structure of the JSON object in a better way than just dumping it as a string.
A couple of things that need doing:
Two problems:
FileWatcher probably needs a little bit of a rethink. fsnotify should be used on the parent directory maybe?
When searching for events, they are retrieved in batches of 1000 using OFFSET/LIMIT. The reason why we can't retrieve all results in one query is because SQLite locks itself while the query is running, so we can't write the results to the JobResults table while we have the search rowset open.
The current implementation becomes slower and slower the more results we find because OFFSET gets more expensive the bigger the offset is. By using https://stackoverflow.com/questions/14468586/efficient-paging-in-sqlite-with-millions-of-records we should be able to speed up the search significantly. Testing directly against the database with ~1M events, retrieving page 200 went from taking 600ms to 3ms.
If you perform a search, go to the second page of results and then refresh the page, currently you will be sent back to the first page of results. You also can't link a specific result page to your friends :(. To fix this, a query parameter named "page" should be added to the URL when the pagination is used to navigate to a different page.
If you start a search from the home page, the URL will be updated so you can link to the search, but if you start a search from the search page this does not happen, making the search unlinkable.
SQLite has support for compressing FTS tables: https://sqlite.org/fts3.html#the_compress_and_uncompress_options
This should be supported by Logsuck to prevent the database file size from growing out of control. Depending on how much it affects insert performance this might need to be configurable so that users with high throughput needs can disable it.
hi,
does log suck support the following features?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.